Parallelize Filling of Degree-Lowering Table #284

aszepieniec · 2024-05-17T20:37:20Z

This PR modifies the auto-generated code for filling the degree-lowering table, essentially upgrading it from sequential to parallel iteration. To achieve this, the master base and extension tables are split into left and right parts at the column index separating original columns from degree-lowering columns. Then parallel iteration over the rows allows filling in the degree-lowering rows left to right. No value in any degree-lowering row depends on values to its right.

A complication arises when filling the degree-lowering columns for transition constraints as the degree-lowering values depend on two rows, current and next. The solution is already implied by the AIR constraints, which ensures that all degree-lowering values live in the current row. Therefore, by parallel iteration over single rows of the degree-lowering part, and overlapping row pairs of the original part, one can fill the former left to right. Note that the overlapping row pairs do not interfere with parallel execution because these are immutable references. Rust disallows multiple mutable references to the same data, but in this case only the right half of the table is mutable, and there we select one row per iteration.

Much of the kudos goes to @jan-ferdinand who came up with the blueprint for the solution.

On my desktop machine, benchmarking prove_halt --

sequential:

parallel:

(instead of vec of indices)

jan-ferdinand

Good stuff! I've left a few nits & questions inline.

constraint-evaluation-generator/src/substitution.rs

jan-ferdinand · 2024-05-17T21:09:45Z

I'd like to point out that in the screenshots, the number of iterations is different: 292 in the sequential case, 347 in the parallel. This gives the following mean times & improvements:

mean [ms]	old	new
pad	1.62	0.29	-82%
fill	1.49	0.16	-89%

Excellent work.

Sword-Smith · 2024-05-17T23:08:13Z

I benchmarked this on our reference machine, and it gives an additional 2kHz on top of #283 :)

After the #283 branch was rebased against this branch:

### Prove Fibonacci 00000000000000102400       70.35s    #Reps   Share  Category
├─Fiat-Shamir: claim                            4.26µs       1   0.00%  (hash –  0.00%)
├─derive additional parameters                 11.96µs       1   0.00%  
├─base tables                                  30.95s        1  43.99%  
│ ├─create                                      2.36s        1   3.36%  (gen  – 28.76%)
│ ├─pad                                       351.50ms       1   0.50%  (gen  –  4.28%)
│ │ ├─pad original tables                     178.15ms       1   0.25%  
│ │ └─fill degree-lowering table              173.28ms       1   0.25%  
│ ├─randomize trace                            83.18ms       1   0.12%  (gen  –  1.01%)
│ ├─LDE                                        17.56s        1  24.96%  (LDE  – 49.88%)
│ │ ├─polynomial zero-initialization            3.45µs       1   0.00%  
│ │ ├─interpolation                             2.38s        1   3.38%  
│ │ ├─resize                                  772.00ns       1   0.00%  
│ │ ├─memory warmup                           866.28ms       1   1.23%  
│ │ ├─evaluation                               14.32s        1  20.35%  
│ │ └─memoize                                 772.00ns       1   0.00%  
│ ├─Merkle tree                                 5.23s        1   7.44%  (hash – 48.16%)
│ │ ├─leafs                                     4.80s        1   6.83%  
│ │ └─Merkle tree                             406.91ms       1   0.58%  
│ ├─Fiat-Shamir                                24.68µs       1   0.00%  (hash –  0.00%)
│ └─extend                                      5.36s        1   7.62%  (gen  – 65.25%)
├─ext tables                                   15.92s        1  22.64%  
│ ├─randomize trace                            57.53ms       1   0.08%  (gen  –  0.70%)
│ ├─LDE                                        10.98s        1  15.60%  (LDE  – 31.18%)
│ │ ├─polynomial zero-initialization            1.83µs       1   0.00%  
│ │ ├─interpolation                             1.93s        1   2.75%  
│ │ ├─resize                                    1.16µs       1   0.00%  
│ │ ├─memory warmup                           613.01ms       1   0.87%  
│ │ ├─evaluation                                8.36s        1  11.88%  
│ │ └─memoize                                   1.53µs       1   0.00%  
│ ├─Merkle tree                                 4.89s        1   6.95%  (hash – 45.04%)
│ │ ├─leafs                                     4.01s        1   5.71%  
│ │ └─Merkle tree                             756.02ms       1   1.07%  
│ └─Fiat-Shamir                               127.55µs       1   0.00%  (hash –  0.00%)
├─quotient calculation (cached)                 4.74s        1   6.73%  (CC   – 62.59%)
│ ├─zerofier inverse                          864.25ms       1   1.23%  
│ └─evaluate AIR, compute quotient codeword     3.73s        1   5.31%  
├─quotient LDE                                  6.67s        1   9.48%  (LDE  – 18.94%)
├─hash rows of quotient segments              342.65ms       1   0.49%  (hash –  3.15%)
├─Merkle tree                                 396.10ms       1   0.56%  (hash –  3.65%)
├─out-of-domain rows                          605.21ms       1   0.86%  
├─Fiat-Shamir                                  85.84µs       1   0.00%  (hash –  0.00%)
├─linear combination                            2.98s        1   4.23%  
│ ├─base                                      515.97ms       1   0.73%  (CC   –  6.82%)
│ ├─ext                                        75.43ms       1   0.11%  (CC   –  1.00%)
│ └─quotient                                    1.13s        1   1.61%  (CC   – 14.93%)
├─DEEP                                          1.16s        1   1.66%  
│ ├─base&ext curr row                         365.85ms       1   0.52%  
│ ├─base&ext next row                         393.85ms       1   0.56%  
│ └─segmented quotient                        404.62ms       1   0.58%  
├─combined DEEP polynomial                      1.19s        1   1.69%  
│ └─sum                                         1.11s        1   1.58%  (CC   – 14.66%)
├─FRI                                           1.46s        1   2.07%  
└─open trace leafs                              1.56ms       1   0.00%  

### Categories
LDE     35.20s  50.04%
hash    10.86s  15.44%
gen      8.22s  11.68%
CC       7.57s  10.75%

Clock frequency is 14555 Hz (1024009 clock cycles / 70353 ms / 1 iterations)
Optimal clock frequency is 14904 Hz (1048576 padded height / 70353 ms / 1 iterations)
FRI domain length is 2^23

https://rust-lang.github.io/rust-clippy/master/index.html#/from_iter_instead_of_collect

aszepieniec added 7 commits May 16, 2024 15:52

profile: Track performance of filling degree-lowering table

6c1c66a

refactor: Transmit endpoints of index ranges

c65d3a9

(instead of vec of indices)

feat: Parallelize single base row substitutions

c290e43

feat: Parallelize single extension row substitutions

16e41d1

refactor: Avoid 1-wide array slicing operation

ed77913

perf: Parallelize base dual row substitutions

b5b7173

perf: Parallelize dual extension row substitutions

f20ff6f

aszepieniec requested a review from jan-ferdinand May 17, 2024 20:37

jan-ferdinand reviewed May 17, 2024

View reviewed changes

aszepieniec added 5 commits May 22, 2024 11:47

style: Use array indexing instead of get and unwrap

aff29a3

style: Use push instead of append to concatenate single element

f4f00b2

style: Use array indexing instead of get and unwrap

8c27476

refactor: Use 1d array from list of indices

373b4bb

style: Use collect instead of from_iter

f617e60

https://rust-lang.github.io/rust-clippy/master/index.html#/from_iter_instead_of_collect

jan-ferdinand approved these changes May 22, 2024

View reviewed changes

aszepieniec merged commit 9c02c64 into master May 22, 2024
5 checks passed

jan-ferdinand deleted the asz/fast-fill-degree-lowering-table branch May 22, 2024 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize Filling of Degree-Lowering Table #284

Parallelize Filling of Degree-Lowering Table #284

aszepieniec commented May 17, 2024

jan-ferdinand left a comment

jan-ferdinand commented May 17, 2024

Sword-Smith commented May 17, 2024

Parallelize Filling of Degree-Lowering Table #284

Parallelize Filling of Degree-Lowering Table #284

Conversation

aszepieniec commented May 17, 2024

jan-ferdinand left a comment

Choose a reason for hiding this comment

jan-ferdinand commented May 17, 2024

Sword-Smith commented May 17, 2024