-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize Filling of Degree-Lowering Table #284
Conversation
(instead of vec of indices)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff! I've left a few nits & questions inline.
I'd like to point out that in the screenshots, the number of iterations is different: 292 in the sequential case, 347 in the parallel. This gives the following mean times & improvements:
Excellent work. |
I benchmarked this on our reference machine, and it gives an additional 2kHz on top of #283 :) After the #283 branch was rebased against this branch:
|
This PR modifies the auto-generated code for filling the degree-lowering table, essentially upgrading it from sequential to parallel iteration. To achieve this, the master base and extension tables are split into left and right parts at the column index separating original columns from degree-lowering columns. Then parallel iteration over the rows allows filling in the degree-lowering rows left to right. No value in any degree-lowering row depends on values to its right.
A complication arises when filling the degree-lowering columns for transition constraints as the degree-lowering values depend on two rows, current and next. The solution is already implied by the AIR constraints, which ensures that all degree-lowering values live in the current row. Therefore, by parallel iteration over single rows of the degree-lowering part, and overlapping row pairs of the original part, one can fill the former left to right. Note that the overlapping row pairs do not interfere with parallel execution because these are immutable references. Rust disallows multiple mutable references to the same data, but in this case only the right half of the table is mutable, and there we select one row per iteration.
Much of the kudos goes to @jan-ferdinand who came up with the blueprint for the solution.
On my desktop machine, benchmarking
prove_halt
--sequential:
parallel: