Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Profile and fix slow zero-initialization #283

Merged
merged 1 commit into from
May 21, 2024

Conversation

aszepieniec
Copy link
Collaborator

This fix uses the unsafe keyword, but we are convinced that this is safe because all elements are being written to before read from.

@Sword-Smith Sword-Smith self-requested a review May 17, 2024 21:30
Copy link
Collaborator

@Sword-Smith Sword-Smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing some worrying preliminary results with these changes. a big slowdown. Let's make sure we have a benchmark with relevant set of parameters (for recursive verification) on our current reference machine before we merge.

@Sword-Smith
Copy link
Collaborator

Sword-Smith commented May 17, 2024

With the memory warmup, that I added in my latest commit to this branch, this PR gives a speedup of about 2kHz on a canonical parameter set on our benchmark machine.

Current master

### Prove Fibonacci 00000000000000102400       94.61s    #Reps   Share  Category
├─Fiat-Shamir: claim                            3.71µs       1   0.00%  (hash –  0.00%)
├─derive additional parameters                 18.95µs       1   0.00%  
├─base tables                                  50.73s        1  53.61%  
│ ├─create                                      2.38s        1   2.52%  (gen  – 11.82%)
│ ├─pad                                         9.23s        1   9.76%  (gen  – 45.77%)
│ ├─randomize trace                            86.38ms       1   0.09%  (gen  –  0.43%)
│ ├─LDE                                        25.31s        1  26.75%  (LDE  – 52.87%)
│ ├─Merkle tree                                 5.31s        1   5.61%  (hash – 50.46%)
│ │ ├─leafs                                     4.80s        1   5.07%  
│ │ └─Merkle tree                             489.42ms       1   0.52%  
│ ├─Fiat-Shamir                                24.63µs       1   0.00%  (hash –  0.00%)
│ └─extend                                      8.41s        1   8.89%  (gen  – 41.69%)
├─ext tables                                   20.40s        1  21.56%  
│ ├─randomize trace                            58.57ms       1   0.06%  (gen  –  0.29%)
│ ├─LDE                                        15.86s        1  16.76%  (LDE  – 33.13%)
│ ├─Merkle tree                                 4.48s        1   4.73%  (hash – 42.54%)
│ │ ├─leafs                                     4.06s        1   4.29%  
│ │ └─Merkle tree                             398.66ms       1   0.42%  
│ └─Fiat-Shamir                               124.96µs       1   0.00%  (hash –  0.00%)
├─quotient calculation (cached)                 5.43s        1   5.74%  (CC   – 68.68%)
│ ├─zerofier inverse                            1.19s        1   1.25%  
│ └─evaluate AIR, compute quotient codeword     3.75s        1   3.97%  
├─quotient LDE                                  6.70s        1   7.08%  (LDE  – 14.00%)
├─hash rows of quotient segments              341.03ms       1   0.36%  (hash –  3.24%)
├─Merkle tree                                 394.98ms       1   0.42%  (hash –  3.75%)
├─out-of-domain rows                          626.86ms       1   0.66%  
├─Fiat-Shamir                                  85.34µs       1   0.00%  (hash –  0.00%)
├─linear combination                            2.87s        1   3.03%  
│ ├─base                                      416.99ms       1   0.44%  (CC   –  5.27%)
│ ├─ext                                        59.89ms       1   0.06%  (CC   –  0.76%)
│ └─quotient                                    1.14s        1   1.21%  (CC   – 14.45%)
├─DEEP                                          1.17s        1   1.23%  
│ ├─base&ext curr row                         363.59ms       1   0.38%  
│ ├─base&ext next row                         398.33ms       1   0.42%  
│ └─segmented quotient                        404.52ms       1   0.43%  
├─combined DEEP polynomial                    937.28ms       1   0.99%  
│ └─sum                                       857.15ms       1   0.91%  (CC   – 10.84%)
├─FRI                                           1.45s        1   1.53%  
└─open trace leafs                              1.50ms       1   0.00%  

### Categories
LDE     47.87s  50.59%
gen     20.17s  21.32%
hash    10.53s  11.13%
CC       7.91s   8.36%

Clock frequency is 10823 Hz (1024009 clock cycles / 94610 ms / 1 iterations)
Optimal clock frequency is 11083 Hz (1048576 padded height / 94610 ms / 1 iterations)
FRI domain length is 2^23

This PR

### Prove Fibonacci 00000000000000102400       80.96s    #Reps   Share  Category
├─Fiat-Shamir: claim                            3.97µs       1   0.00%  (hash –  0.00%)
├─derive additional parameters                 22.96µs       1   0.00%  
├─base tables                                  42.71s        1  52.76%  
│ ├─create                                      2.37s        1   2.93%  (gen  – 11.70%)
│ ├─pad                                         9.44s        1  11.66%  (gen  – 46.61%)
│ ├─randomize trace                            85.04ms       1   0.11%  (gen  –  0.42%)
│ ├─LDE                                        17.25s        1  21.31%  (LDE  – 50.29%)
│ │ ├─polynomial zero-initialization            3.00µs       1   0.00%  
│ │ ├─interpolation                             2.40s        1   2.97%  
│ │ ├─resize                                    1.06µs       1   0.00%  
│ │ ├─memory warmup                           867.27ms       1   1.07%  
│ │ ├─evaluation                               13.98s        1  17.27%  
│ │ └─memoize                                   1.24µs       1   0.00%  
│ ├─Merkle tree                                 5.26s        1   6.50%  (hash – 49.00%)
│ │ ├─leafs                                     4.82s        1   5.96%  
│ │ └─Merkle tree                             416.41ms       1   0.51%  
│ ├─Fiat-Shamir                                24.56µs       1   0.00%  (hash –  0.00%)
│ └─extend                                      8.30s        1  10.25%  (gen  – 40.98%)
├─ext tables                                   15.35s        1  18.96%  
│ ├─randomize trace                            59.74ms       1   0.07%  (gen  –  0.29%)
│ ├─LDE                                        10.54s        1  13.02%  (LDE  – 30.74%)
│ │ ├─polynomial zero-initialization            2.37µs       1   0.00%  
│ │ ├─interpolation                             1.91s        1   2.36%  
│ │ ├─resize                                    1.24µs       1   0.00%  
│ │ ├─memory warmup                           613.71ms       1   0.76%  
│ │ ├─evaluation                                8.02s        1   9.90%  
│ │ └─memoize                                   1.06µs       1   0.00%  
│ ├─Merkle tree                                 4.74s        1   5.86%  (hash – 44.15%)
│ │ ├─leafs                                     3.90s        1   4.82%  
│ │ └─Merkle tree                             726.15ms       1   0.90%  
│ └─Fiat-Shamir                               129.19µs       1   0.00%  (hash –  0.00%)
├─quotient calculation (cached)                 4.84s        1   5.98%  (CC   – 65.29%)
│ ├─zerofier inverse                          875.15ms       1   1.08%  
│ └─evaluate AIR, compute quotient codeword     3.74s        1   4.62%  
├─quotient LDE                                  6.51s        1   8.04%  (LDE  – 18.98%)
├─hash rows of quotient segments              340.45ms       1   0.42%  (hash –  3.17%)
├─Merkle tree                                 395.03ms       1   0.49%  (hash –  3.68%)
├─out-of-domain rows                          536.75ms       1   0.66%  
├─Fiat-Shamir                                  86.13µs       1   0.00%  (hash –  0.00%)
├─linear combination                            2.89s        1   3.57%  
│ ├─base                                      224.35ms       1   0.28%  (CC   –  3.02%)
│ ├─ext                                       286.31ms       1   0.35%  (CC   –  3.86%)
│ └─quotient                                    1.12s        1   1.38%  (CC   – 15.03%)
├─DEEP                                          1.16s        1   1.44%  
│ ├─base&ext curr row                         366.18ms       1   0.45%  
│ ├─base&ext next row                         398.12ms       1   0.49%  
│ └─segmented quotient                        398.08ms       1   0.49%  
├─combined DEEP polynomial                      1.03s        1   1.27%  
│ └─sum                                       949.48ms       1   1.17%  (CC   – 12.80%)
├─FRI                                           1.49s        1   1.84%  
└─open trace leafs                              1.47ms       1   0.00%  

### Categories
LDE     34.31s  42.37%
gen     20.26s  25.02%
hash    10.74s  13.27%
CC       7.42s   9.16%

Clock frequency is 12648 Hz (1024009 clock cycles / 80958 ms / 1 iterations)
Optimal clock frequency is 12952 Hz (1048576 padded height / 80958 ms / 1 iterations)
FRI domain length is 2^23

@Sword-Smith Sword-Smith self-requested a review May 17, 2024 22:47
@jan-ferdinand
Copy link
Member

The line you added also makes it glaringly obvious that every element is in fact written to before it is read from, which I like tremendously.

Copy link
Collaborator

@Sword-Smith Sword-Smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add comment about why memory warmup is needed? See e.g. commit message for 21cc30e

Otherwise happy.

@Sword-Smith Sword-Smith self-requested a review May 17, 2024 23:19
triton-vm/src/table/master_table.rs Show resolved Hide resolved
triton-vm/src/table/master_table.rs Outdated Show resolved Hide resolved
triton-vm/src/table/master_table.rs Outdated Show resolved Hide resolved
triton-vm/src/table/master_table.rs Outdated Show resolved Hide resolved
This fix uses the unsafe keyword, but Thorkil is (speaking as "we"
are) convinced that this is safe because all elements are being
written to before read from.

Co-authored-by: Thorkil Værge <[email protected]>
@jan-ferdinand jan-ferdinand merged commit f7b13e7 into master May 21, 2024
5 checks passed
@jan-ferdinand jan-ferdinand deleted the asz/no-zero-initialization branch May 21, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants