Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a prototype task scheduler, written in Rust, that is driven by an abstract event loop, along with the beginnings of an I/O interface for translating asynchronous I/O to synchronous I/O via scheduler context switches. It is part of issue #4419.
It is not ready to merge, but this is an opportunity to review and discuss.
While I am primarily interested in proving the integration of I/O with the scheduler, this is also written with a number of other goals in mind:
What is implemented:
Unimplemented:
Scheduler design
I am trying to use ownership to make the relationships between scheduler types clear, and as a result this scheduler is structured quite differently than the current one.
The core idea here is that tasks are owned, and code that owns a task is free to schedule it. During the lifetime of a task ownership transfers between schedulers and objects on which the task is blocked. The state of a task (blocked vs. running vs. dead, etc.) is encoded in its ownership.
For comparison, in the existing scheduler, the task is (basically) owned by a scheduler, but is atomically reference counted so other entities (like pipes that a task is blocked on) occasionally hold pointers to a task. The resulting task lifecycle is quite complicated, and I think unnecessarily so.
I believe this model works with pipes very well, though I don't have a complete understanding of pipes yet.
You'll notice this code is still written as a very big test case, for ease of development.
The most important submodules:
Scheduler
andTask
io
for libuvIn this iteration,
Scheduler
is very simple, mostly providing a few context switching primitive operations, and I like that, so I'll probably break it into multiple types, one as currently written, another dealing with scheduler policy and multithreading. In regards to multithreading, Schedulers are intended to be implemented as actors, mostly dealing with single-threaded state, then occasionally communicating with other schedulers through messages (the current implementation relies more on shared state, locks and signals).rt::io
is the runtime's internal, abstract I/O interface. It is used entirely as opaque ~objects and is intended to support yet other, user-facing I/O modules. The I/O interface here should be considered a proof-of-concept, as a real design will require a lot of consideration.Measurements
I've only done one very simple measurement, comparing pure TCP read performance between node 0.6 and this scheduler. They indicate this code is about on par with node. I take that as a good sign, though you might expect us to beat node with their dynamic language overhead and the various extra abstractions in their libraries compared to this simple Rust code. perf indicates we spend about 50% time in the kernel, then the usual suspects:
upcall_call_shim_on_c_stack
,pthread_getspecific
,malloc
- things that can be tackled in increments.Note: The Rust code used a much smaller buffer than node likely is. Adjusting the buffer size to 64K reduces the userspace time significantly.
Concerns
~objects don't work
I don't actually remember if I tested them or not, but AIUI they don't work correctly, so I've inserted some placeholder typedefs to try to make the code look like it's using objects when it is not. Hopefully it won't be too hard to do the conversion. Until ~objects works uv needs to live in core.
Allocation
There are a lot of small allocations here, particularly in the uv bindings, which rely heavily on owned closures, but also in the
io
interface that uses objects. Because it is much more pleasant to just use closures everywhere than figure out exactly how to thread data around without allocating, I think this tradeoff is good in the short term. The particularly bad allocations can be optimized as needed. The ~object allocations are considerably harder to eliminate.What happens when I/O types are sent?
We talked about this in the meeting last week and it's a big problem. I think now that I/O types (those defined in
rt::io
and that form the basis for any I/O API's) must be sendable, the reason being that one-connection per task is going to be the right way to do networking, at least for the near future. The basic server willlisten
,accept
then capture the connection in a new task.So, assuming that I/O types are sendable, there is going to be a lot of complexity in making those I/O types ensure that the task they are currently associated with is running on the correct scheduler. Importantly, task pinning is not sufficient, since the I/O types are bound to a specific scheduler not task. Nor will it be sufficient to simply fail if an I/O type migrates to the wrong scheduler, because we have no way to prevent it from happening accidentally (that I can see). It's going to be ugly, and could involve polluting our nice single-threaded I/O paths with some concessions to memory synchronization.
Selecting
Related to the above point about one connection per task, we need to be able to make
select
work with various I/O requests, particularly for reading and listening. Not only that, they have to be able to select on both pipes and I/O types simultaneously (so that I can both listen for incoming connections as well as a signal to stop listening for incoming connections). I have no idea yet how to do this, nor do I even know how pipes implements this currently.Adapting work stealing to non-strict parallel computations
The work stealing algorithm described in 'Scheduling Multithreaded Computations by Work Stealing' is for 'strict' computations (fork/join style), which is not what we have. We can do something add-hoc to add randomness but I'd rather
have something known to work. I do think the work stealing approach makes a lot of sense for us, especially now where we have I/O callbacks that need to be scheduled with high-priority (so cold CPU-bound tasks get stolen to schedulers not doing I/O).
Fairness and I/O timeouts
Right now we have a model that does not enforce any sort of time accounting. I kind of like this for performance and simplicity, but it means we can mix CPU bound and I/O bound tasks on a single scheduler without ever yielding to do I/O. I suspect with this merging of I/O and the scheduler we are going to end up wanting some utility functions for setting up groups of schedulers specifically for I/O or specifically for processing.
Next
The immediate goal, pending feedback, will be to get this patch into core. Beyond that there are a number of parallel development paths, primarily I/O design, multithreading and integration.
Begin working on user-facing I/O library (#4248)
This just barely scratches the surface of I/O. Designing an I/O library is itself a huge effort. I think I would like to approach this top down, figure out how we want a synchronous I/O library to be designed, then how to connect it to uv through
rt::io
.When doing this design it's important to consider Rust's unique constraints, in particular the relationship of pipes and the scheduler to I/O. Pipes and I/O readers/writers share a lot in common and need to interoperate in various ways (particularly with
select
). We also need to consider this in the context of the existingcore::io
- what is or isn't working there.I'm no expert on I/O libraries so I think I'd like to hash this out on the mailing lists, perhaps in conjunction with a survey of other languages' I/O.
Multithreading
Integration
start
lang item (Rewrite rust_start in Rust #3406) - so we can start running test cases on this schedulerrt::stack
and not into the task itself. Start by adapting the current scheme to the new scheduler, but consider potential upcoming changes to the FFI.memory_region
.I want to hold off on adding linked failure because I think the current design is too complex still, and the implementation imposes some undesirable constraints on the scheduler.