-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cancellation on disconnect, & parallelize Helper aggregation computations. #3119
Conversation
A couple of things to pay attention to in review:
|
Part of #3117. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't read the changes closely yet, but I have some early feedback based on testing.
As for tracing spans, we will want to create a span, make it a child of the current span, and pass it to the closure currently used in ParallelIterator::map()
. (similar to before) There, we should enter the span at the top of the closure. This will ensure our events logged inside the closure have an enclosing span. Note that the span documentation says "it is entirely valid for multiple threads to enter the same span concurrently", so it's okay to share one child span. We can pass it as the initialization argument of map_with()
, so it gets cloned once per worker thread.
Previously, we used a rayon threadpool for aggregation computations, but the computations within the handling of a single aggregation job were still serialized. Now, we use a parallel iterator for handling each report share in turn, allowing the VDAF evaluations to be performed in parallel. Also, remove the (unstated!) requirement in aggregation_job_writer than report aggregations be provided in order. This eases parallelization of the aggregation job continuation logic.
This will be very helpful when we have requests which timeout, which is presented to Janus as a client disconnect. Otherwise, we would continue processing the request until it is complete. Update Helper aggregation methods, which use rayon to parallelize processing, to respect cancellation. Specifically, the `receiver` will be dropped, causing the producer's `sender` to return a SendError, which will cause `try_for_each_with` to stop processing.
Also, * Receive only 10 (arbitrarily chosen) messages per call to `recv_many`. This will give more await points to cancel on during VDAF computations. * Rename a variable.
0262b90
to
d756a2f
Compare
(Rebased on latest |
This implements for the Leader what #3119 did for the Helper. Like with the Helper, a rayon threadpool is used to parallelize aggregation computations. These computations respect cancellation (though I'm not sure if anything short of process death will currently cancel the aggregation_job_driver's computations).
This implements for the Leader what #3119 did for the Helper. Like with the Helper, a rayon threadpool is used to parallelize aggregation computations. These computations respect cancellation (though I'm not sure if anything short of process death will currently cancel the aggregation_job_driver's computations).
This will be very helpful when we have requests which timeout, which is presented to Janus as a client disconnect. Otherwise, we would continue processing the request until it is complete.
Previously, we used a rayon threadpool for aggregation computations, but the computations within the handling of a single aggregation job were still serialized. Now, we use a parallel iterator for handling each report share in turn, allowing the VDAF evaluations to be performed in parallel. The Helper methods are also updated such that the parallel computations respect cancellation. Specifically, the
receiver
will be dropped on cancellation, causing the producer threads'sender
to return aSendError
when they attempt to send, which will causetry_for_each_with
to stop processing, which will cause the producer thread to terminate.Also, remove the (unstated!) requirement in aggregation_job_writer than report aggregations be provided in order. This eases parallelization of the aggregation job continuation logic.