A new scheduler prototype #5022

brson · 2013-02-19T01:13:38Z

This is a prototype task scheduler, written in Rust, that is driven by an abstract event loop, along with the beginnings of an I/O interface for translating asynchronous I/O to synchronous I/O via scheduler context switches. It is part of issue #4419.

It is not ready to merge, but this is an opportunity to review and discuss.

While I am primarily interested in proving the integration of I/O with the scheduler, this is also written with a number of other goals in mind:

Replacing C++ code with Rust
Clean up stack management (Extract stack management code from rust_task #2044, Run Rust code on native stack segments #4479, Don't switch to the C stack when there is room on the Rust stack #4480)
Supporting the work stealing algorithm (Scheduler work stealing. #3095)
Do much less synchronization in some scheduling scenarios (particularly involving I/O, but also possibly spawning, sending, etc.).
Simplifying the lifecycle and multithreading complexity of runtime types

What is implemented:

Context switching operations (scheduler->task, task->scheduler, task->task, blocking)
An opaque event loop and I/O abstraction using objects
Safer uv bindings that build on std::uv::ll
Very simple TCP reads and writes

Unimplemented:

Anything related to multithreading
Much I/O, including dealing with any cases where a task isn't available to respond to an I/O event
Platforms other than x864_64 and unix

Scheduler design

I am trying to use ownership to make the relationships between scheduler types clear, and as a result this scheduler is structured quite differently than the current one.

The core idea here is that tasks are owned, and code that owns a task is free to schedule it. During the lifetime of a task ownership transfers between schedulers and objects on which the task is blocked. The state of a task (blocked vs. running vs. dead, etc.) is encoded in its ownership.

For comparison, in the existing scheduler, the task is (basically) owned by a scheduler, but is atomically reference counted so other entities (like pipes that a task is blocked on) occasionally hold pointers to a task. The resulting task lifecycle is quite complicated, and I think unnecessarily so.

I believe this model works with pipes very well, though I don't have a complete understanding of pipes yet.

You'll notice this code is still written as a very big test case, for ease of development.

The most important submodules:

rt::sched - The core of the scheduler, containing both Scheduler and Task
rt::uv - libuv bindings that build on uv::ll
rt::io - The runtime's abstract I/O interface, implemented for a specific event loop
rt::uvio - The implementation of io for libuv

In this iteration, Scheduler is very simple, mostly providing a few context switching primitive operations, and I like that, so I'll probably break it into multiple types, one as currently written, another dealing with scheduler policy and multithreading. In regards to multithreading, Schedulers are intended to be implemented as actors, mostly dealing with single-threaded state, then occasionally communicating with other schedulers through messages (the current implementation relies more on shared state, locks and signals).

rt::io is the runtime's internal, abstract I/O interface. It is used entirely as opaque ~objects and is intended to support yet other, user-facing I/O modules. The I/O interface here should be considered a proof-of-concept, as a real design will require a lot of consideration.

Measurements

I've only done one very simple measurement, comparing pure TCP read performance between node 0.6 and this scheduler. They indicate this code is about on par with node. I take that as a good sign, though you might expect us to beat node with their dynamic language overhead and the various extra abstractions in their libraries compared to this simple Rust code. perf indicates we spend about 50% time in the kernel, then the usual suspects: upcall_call_shim_on_c_stack, pthread_getspecific, malloc - things that can be tackled in increments.

Note: The Rust code used a much smaller buffer than node likely is. Adjusting the buffer size to 64K reduces the userspace time significantly.

Concerns

~objects don't work

I don't actually remember if I tested them or not, but AIUI they don't work correctly, so I've inserted some placeholder typedefs to try to make the code look like it's using objects when it is not. Hopefully it won't be too hard to do the conversion. Until ~objects works uv needs to live in core.

Allocation

There are a lot of small allocations here, particularly in the uv bindings, which rely heavily on owned closures, but also in the io interface that uses objects. Because it is much more pleasant to just use closures everywhere than figure out exactly how to thread data around without allocating, I think this tradeoff is good in the short term. The particularly bad allocations can be optimized as needed. The ~object allocations are considerably harder to eliminate.

What happens when I/O types are sent?

We talked about this in the meeting last week and it's a big problem. I think now that I/O types (those defined in rt::io and that form the basis for any I/O API's) must be sendable, the reason being that one-connection per task is going to be the right way to do networking, at least for the near future. The basic server will listen, accept then capture the connection in a new task.

So, assuming that I/O types are sendable, there is going to be a lot of complexity in making those I/O types ensure that the task they are currently associated with is running on the correct scheduler. Importantly, task pinning is not sufficient, since the I/O types are bound to a specific scheduler not task. Nor will it be sufficient to simply fail if an I/O type migrates to the wrong scheduler, because we have no way to prevent it from happening accidentally (that I can see). It's going to be ugly, and could involve polluting our nice single-threaded I/O paths with some concessions to memory synchronization.

Selecting

Related to the above point about one connection per task, we need to be able to make select work with various I/O requests, particularly for reading and listening. Not only that, they have to be able to select on both pipes and I/O types simultaneously (so that I can both listen for incoming connections as well as a signal to stop listening for incoming connections). I have no idea yet how to do this, nor do I even know how pipes implements this currently.

Adapting work stealing to non-strict parallel computations

The work stealing algorithm described in 'Scheduling Multithreaded Computations by Work Stealing' is for 'strict' computations (fork/join style), which is not what we have. We can do something add-hoc to add randomness but I'd rather
have something known to work. I do think the work stealing approach makes a lot of sense for us, especially now where we have I/O callbacks that need to be scheduled with high-priority (so cold CPU-bound tasks get stolen to schedulers not doing I/O).

Fairness and I/O timeouts

Right now we have a model that does not enforce any sort of time accounting. I kind of like this for performance and simplicity, but it means we can mix CPU bound and I/O bound tasks on a single scheduler without ever yielding to do I/O. I suspect with this merging of I/O and the scheduler we are going to end up wanting some utility functions for setting up groups of schedulers specifically for I/O or specifically for processing.

When doing this design it's important to consider Rust's unique constraints, in particular the relationship of pipes and the scheduler to I/O. Pipes and I/O readers/writers share a lot in common and need to interoperate in various ways (particularly with select). We also need to consider this in the context of the existing core::io - what is or isn't working there.

I'm no expert on I/O libraries so I think I'd like to hash this out on the mailing lists, perhaps in conjunction with a survey of other languages' I/O.

Multithreading

Create a parallel deque (Write a parallel deque for work stealing #4877)
Port pipes to this scheduler (Port pipes to new scheduler #5018)
Encapsulate scheduling 'policy' in one place (SchedulerPolicy) - so we can experiment easily
Teach schedulers to communicate and steal work

Integration

Fix ~object and convert code to use it
Rewrite net::ip to not use uv (Rewrite net::ip without using libuv #4956)
Move uv to core or its own crate (Put uv in its own crate #5019)
Context switching for remaining platforms (Port context switching to Rust for platforms other than x86_64 #5020)
start lang item (Rewrite rust_start in Rust #3406) - so we can start running test cases on this scheduler
Bindings for win32 TLS - this only has pthread bindings
Stack growth and stack switching - this time stack logic goes into rt::stack and not into the task itself. Start by adapting the current scheme to the new scheduler, but consider potential upcoming changes to the FFI.
Logging (Make logging work with new runtime code #5021) - Figure out how to make logging work in global, scheduler, and task context, in the old runtime or in this runtime. Optional complete redesign (Redesign the logging subsystem #3309).
Add the local heap - It can start as just a wrapper for memory_region.

I want to hold off on adding linked failure because I think the current design is too complex still, and the implementation imposes some undesirable constraints on the scheduler.

bnoordhuis · 2013-02-20T18:15:26Z

Bindings for win32 TLS - this only has pthread bindings

I'm open to landing cross-platform TLS functions in libuv. We have a medium-term need for that in node.js so I don't mind being a little proactive here.

(EDIT: But I see there's another issue about not putting libuv in core so I guess that's a no-go then.)

r? Followup to #5022. This is the same, but everything is in `core::rt` now. `std::uv_ll` is moved to `core::unstable::uvll`, with the intent that it eventually move into its own crate (blocked on #5192 at least). I've had to disable the uv tests because of #2064. All of `core::rt` is disabled on platforms that aren't mac or linux until I complete the windows thread local storage bindings and ARM context switching. My immediate next priorities will be to fix #2064 and clean up the uv bindings, get everything building on all platforms.

Fix useless_attribute suggestion Fixes rust-lang#5021 changelog: Fix [`useless_attribute`] suggestion, which tripped rustfix

brson added 4 commits February 11, 2013 15:06

core: Make run_in_bare_thread safe. No memory-unsafety here

544749e

Initial scheduler prototyping

9b0b676

Start integrating I/O into scheduler

19301dd

Break newsched into multiple files

5e41a97

brson mentioned this pull request Feb 23, 2013

Rewrite net::ip without using libuv #4956

Closed

brson closed this Mar 7, 2013

brson mentioned this pull request Mar 10, 2013

core: Add rt mod and add the new scheduler code #5303

Closed

graydon mentioned this pull request Mar 12, 2013

Extract stack management code from rust_task #2044

Closed

brson mentioned this pull request Apr 30, 2013

Scheduler rewrite with I/O event loop #4419

Closed

30 tasks

bors added a commit to rust-lang-ci/rust that referenced this pull request May 2, 2020

Auto merge of rust-lang#5022 - flip1995:useless_attr, r=phansch

0da0ae3

Fix useless_attribute suggestion Fixes rust-lang#5021 changelog: Fix [`useless_attribute`] suggestion, which tripped rustfix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new scheduler prototype #5022

A new scheduler prototype #5022

brson commented Feb 19, 2013

bnoordhuis commented Feb 20, 2013

A new scheduler prototype #5022

A new scheduler prototype #5022

Conversation

brson commented Feb 19, 2013

Scheduler design

Measurements

Concerns

~objects don't work

Allocation

What happens when I/O types are sent?

Selecting

Adapting work stealing to non-strict parallel computations

Fairness and I/O timeouts

Next

Begin working on user-facing I/O library (#4248)

Multithreading

Integration

bnoordhuis commented Feb 20, 2013