Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SnoopPrecompile #2441

Merged
merged 12 commits into from
Jan 20, 2023
Merged

Use SnoopPrecompile #2441

merged 12 commits into from
Jan 20, 2023

Conversation

rikhuijzer
Copy link
Collaborator

@rikhuijzer rikhuijzer commented Jan 12, 2023

Now that caching of LLVM is available in Julia 1.9, PrecompileSignatures.jl is likely not the most useful anymore. It's better to run code in a SnoopPrecompile.@precompile_all_calls block what this PR does.

This used to be the compilation time when running the tests with ENV["PLUTO_TEST_ONLY_COMPILETIMES"] = true on Julia 1.9.0-beta2:

 ────────────────────────────────────────────────────────────────────────
                                              Time          Allocations
                                        ───────────────   ───────────────
             Total measured:                 93.9s            1.67GiB

 Section                        ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────
 import Pluto                        1    73.7s   83.9%    131MiB    7.7%
 compiletimes.jl                     1    14.1s   16.1%   1.54GiB   92.3%
   PlutoRunner.run_expression        1    606ms    0.7%   38.8MiB    2.3%
   SessionActions.open               1    5.23s    6.0%    735MiB   43.1%
   Pluto.run                         1    2.00s    2.3%    215MiB   12.6%
 ────────────────────────────────────────────────────────────────────────

After removing PrecompileSignatures.jl and adding SnoopPrecompile.@precompile_all_calls (this PR), the benchmark shows:

────────────────────────────────────────────────────────────────────────
                                             Time          Allocations
                                       ───────────────   ───────────────
            Total measured:                 43.7s            2.53GiB

Section                        ncalls     time    %tot     alloc    %tot
────────────────────────────────────────────────────────────────────────
import Pluto                        1    13.7s   36.5%   46.2MiB    1.8%
compiletimes.jl                     1    23.8s   63.5%   2.48GiB   98.2%
  PlutoRunner.run_expression        1   24.6ms    0.1%    296KiB    0.0%
  SessionActions.open               1    16.0s   42.6%   1.69GiB   67.2%
  Pluto.run                         1    1.14s    3.0%    168MiB    6.5%
────────────────────────────────────────────────────────────────────────

From this we can conclude that, import Pluto is way faster for some reason and SessionActions.open is way slower. Notice especially that PlutoRunner.run_expression is way faster. So the potential is there if only we can avoid side-effects. Overall, SnoopPrecompile should make things faster. SnoopPrecompile is fundamentally better than PrecompileSignatures because it ensures that all functions in the call can be inferred and compiled whereas the inference in PrecompileSignatures will give up at some point which, in turn, causes not everything to be compiled.

In general, the TTFX can probably be reduced further by finding things to call during the SnoopPrecompile.@precompile_all_calls phase. Finding these things is very tricky because it calling exactly the right code which (1) compiles as much as possible (2) doesn't cause unwanted side-effects such as network requests or file writes. I've tried to run a full server inside the precompilation phase but that caused a SIGKILL on my 16 GB RAM machine, so we probably have to be very careful with that.

@rikhuijzer rikhuijzer requested a review from fonsp January 12, 2023 12:42
@github-actions
Copy link
Contributor

Try this Pull Request!

Open Julia and type:

julia> import Pkg
julia> Pkg.activate(temp=true)
julia> Pkg.add(url="https://github.com/fonsp/Pluto.jl", rev="rh/snoopprecompile")
julia> using Pluto

src/Pluto.jl Show resolved Hide resolved
)
end
expr = Expr(:toplevel, :(1 + 1))
Pluto.PlutoRunner.run_expression(__Foo, expr, __TEST_NOTEBOOK_ID, uuid1(), nothing);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this precompilation could benefit notebook processes too (it is only called for simple markdown cells on the Pluto process), maybe with #1881 ?

Copy link
Collaborator Author

@rikhuijzer rikhuijzer Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried to add the following to src/precompile.jl:

    session = Pluto.ServerSession()
    session.options.evaluation.workspace_use_distributed = false
    basic = joinpath(pkgdir(Pluto), "sample", "Basic.jl")
    nb = load_notebook(basic)
    Pluto.WorkspaceManager.make_workspace((session, nb))

This grinds my PC to a halt. Even Firefox becomes unresponsive even though the compilations occurs on only one thread. I suspect that the calls to Distributed or some of the other async calls are the problem.

@rikhuijzer
Copy link
Collaborator Author

The benchmark after 86de0d4 looks as follows:

 ────────────────────────────────────────────────────────────────────────
                                              Time          Allocations
                                        ───────────────   ───────────────
             Total measured:                 48.8s            2.35GiB

 Section                        ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────
 import Pluto                        1    22.3s   52.2%   63.9MiB    2.7%
 compiletimes.jl                     1    20.4s   47.8%   2.28GiB   97.3%
   PlutoRunner.run_expression        1   20.2ms    0.0%    296KiB    0.0%
   SessionActions.open               1    11.2s   26.2%   1.30GiB   55.3%
   Pluto.run                         1    2.91s    6.8%    375MiB   15.6%
 ────────────────────────────────────────────────────────────────────────

It looks like reducing the time for SessionActions.open causes the seconds to be added to the import Pluto time. Some of the precompilation is now focused on the PlutoRunner which is likely the best investment because that code would otherwise be recompiled for each notebook.

@fonsp
Copy link
Owner

fonsp commented Jan 16, 2023

Thanks @rikhuijzer ! Awesome, this is giving a 2x speedup in loading times? 😮

About the balance: in the past, we always tried to do as much loading as possible during import Pluto or precompilation. All loading time after Pluto.run() makes Pluto look slow and unresponsive, especially if it happens after the browser window has opened (which is the case with SessionActions.open).

How much compilation time can be moved from SessionActions.open to import Pluto?

@fonsp fonsp mentioned this pull request Jan 16, 2023
@rikhuijzer
Copy link
Collaborator Author

rikhuijzer commented Jan 17, 2023

How much compilation time can be moved from SessionActions.open to import Pluto?

The most aggressive compilation is by using SnoopPrecompile and PrecompileSignatures at the same time (see d94064b). That gives:

 ────────────────────────────────────────────────────────────────────────
                                              Time          Allocations
                                        ───────────────   ───────────────
             Total measured:                 77.7s            1.57GiB

 Section                        ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────
 import Pluto                        1    59.7s   83.3%    132MiB    8.2%
 compiletimes.jl                     1    12.0s   16.7%   1.44GiB   91.8%
   PlutoRunner.run_expression        1   17.7ms    0.0%    274KiB    0.0%
   SessionActions.open               1    4.51s    6.3%    687MiB   42.8%
   Pluto.run                         1    1.84s    2.6%    216MiB   13.4%
 ────────────────────────────────────────────────────────────────────────

So, compared to before this PR, this reduces the compilation by about (1.54 GiB - 1.44 GiB) / 1.54 GiB = 6%. I'm using allocations as an estimate here because the running time is affected by how quickly the packages are updated.

Maybe this is good enough for now? If we really want to reduce compilation, we should go to HTTP.jl and figure out how SnoopPrecompile can be used effectively there without introducing side-effects during precompilation (side effects are frowned upon based on JuliaWeb/HTTP.jl#750 (comment)). Or, we do start a HTTP server and request from it during Pluto precompilation, if we dare that. Also, we should run benchmarks to find other spots to call SnoopPrecompile on.

@rikhuijzer rikhuijzer changed the title Switch to SnoopPrecompile Use SnoopPrecompile Jan 17, 2023
basic = joinpath(pkgdir(Pluto), "sample", "Basic.jl")
nb = load_notebook(basic)

# Compiling PlutoRunner is very beneficial because it saves time in each notebook.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But PlutoRunner is used by the notebook process, not the server process. Are we still getting a benefit?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need #1881 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But PlutoRunner is used by the notebook process, not the server process. Are we still getting a benefit?

Ah! Because Pluto is not loaded in the notebook process, so the precompilation cache is also not loaded?

Yes maybe we need #1881 then indeed

Copy link
Collaborator Author

@rikhuijzer rikhuijzer Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it looks like precompiling PlutoRunner has no effect currently.

This is in the server process:

julia> using Pluto

julia> @time @eval Pluto.PlutoRunner.show_richest(IOBuffer(), (; n=1));
  0.009845 seconds (2.36 k allocations: 139.334 KiB, 85.56% compilation time)

and then after starting Pluto and running a new notebook with:

PlutoRunner.show_richest(IOBuffer(), (; n=1))

takes more than 1 seconds on the first run.

@fonsp
Copy link
Owner

fonsp commented Jan 18, 2023

Awesome! I'm calling with Rik after compilation time this afternoon at 3pm CET if anyone wants to join!

@fonsp
Copy link
Owner

fonsp commented Jan 18, 2023

I made a notebook that shows how to trigger lots of Pluto's functionality without starting a process or opening a socket.

https://htmlview.glitch.me/?https://gist.github.com/fonsp/e005fa2c11d9dc92d21bfbc6f3780394

@rikhuijzer
Copy link
Collaborator Author

I made a notebook that shows how to trigger lots of Pluto's functionality without starting a process or opening a socket.

https://htmlview.glitch.me/?https://gist.github.com/fonsp/e005fa2c11d9dc92d21bfbc6f3780394

Very nice!! Really nice how you avoided some IO while still hitting lots of code.

I've added the suggestions in 98ec719 and ran the benchmark with Julia 1.9.0-beta3. First another run based on main:

────────────────────────────────────────────────────────────────────────
                                              Time          Allocations
                                        ───────────────   ───────────────
             Total measured:                 20.7s            1.69GiB

 Section                        ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────
 import Pluto                        1    930ms    6.4%    126MiB    7.3%
 compiletimes.jl                     1    13.6s   93.6%   1.56GiB   92.7%
   PlutoRunner.run_expression        1    598ms    4.1%   39.0MiB    2.3%
   SessionActions.open               1    5.18s   35.7%    747MiB   43.3%
   Pluto.run                         1    1.88s   13.0%    224MiB   13.0%
 ────────────────────────────────────────────────────────────────────────

Then, with the stuff that I wrote (commit 13de63f):

────────────────────────────────────────────────────────────────────────
                                              Time          Allocations
                                        ───────────────   ───────────────
             Total measured:                 19.8s            1.62GiB

 Section                        ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────
 import Pluto                        1    900ms    6.6%    127MiB    7.7%
 compiletimes.jl                     1    12.7s   93.4%   1.49GiB   92.3%
   PlutoRunner.run_expression        1   26.2ms    0.2%    916KiB    0.1%
   SessionActions.open               1    4.96s   36.4%    724MiB   43.8%
   Pluto.run                         1    1.89s   13.9%    224MiB   13.6%
 ────────────────────────────────────────────────────────────────────────

and then including what you suggested (commit 98ec719):

────────────────────────────────────────────────────────────────────────
                                              Time          Allocations
                                        ───────────────   ───────────────
             Total measured:                 19.0s            1.58GiB

 Section                        ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────
 import Pluto                        1    911ms    7.1%    129MiB    8.0%
 compiletimes.jl                     1    11.9s   92.9%   1.45GiB   92.0%
   PlutoRunner.run_expression        1   26.4ms    0.2%    916KiB    0.1%
   SessionActions.open               1    4.43s   34.6%    694MiB   43.0%
   Pluto.run                         1    1.65s   12.9%    205MiB   12.7%
 ────────────────────────────────────────────────────────────────────────

Generally, it looks like the switch from 1.9.0-beta2 to 1.9.0-beta3 reduced the import Pluto time by 59 seconds (a 99% reduction). Then related to the changed we made here, the suggestions do help a bit in the benchmarks and probably also outside of the things we measure. Furthermore, we'll probably see further gains with #1881.

I've also benchmarked Julia 1.8.5. This is on main:

────────────────────────────────────────────────────────────────────────
                                              Time          Allocations
                                        ───────────────   ───────────────
             Total measured:                 29.5s            2.29GiB

 Section                        ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────
 import Pluto                        1    1.25s    5.5%    282MiB   12.1%
 compiletimes.jl                     1    21.4s   94.5%   2.00GiB   87.9%
   PlutoRunner.run_expression        1    1.11s    4.9%   67.0MiB    2.9%
   SessionActions.open               1    11.7s   51.7%   0.93GiB   41.0%
   Pluto.run                         1    2.42s   10.7%    260MiB   11.2%
 ────────────────────────────────────────────────────────────────────────

and this is this PR (commit 98ec719):

────────────────────────────────────────────────────────────────────────
                                              Time          Allocations
                                        ───────────────   ───────────────
             Total measured:                 29.4s            2.13GiB

 Section                        ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────
 import Pluto                        1    1.35s    6.0%    288MiB   13.3%
 compiletimes.jl                     1    21.2s   94.0%   1.83GiB   86.7%
   PlutoRunner.run_expression        1    941ms    4.2%   3.05MiB    0.1%
   SessionActions.open               1    11.4s   50.8%    879MiB   40.6%
   Pluto.run                         1    2.54s   11.3%    235MiB   10.8%
 ────────────────────────────────────────────────────────────────────────

I didn't test 1.6 because, well, if you use that you will be waiting on Julia anyway.

Overall, it looks like this PR is a slight improvement thanks to SnoopPrecompile.

.gitignore Outdated Show resolved Hide resolved
@fonsp
Copy link
Owner

fonsp commented Jan 19, 2023

Hmmmm, I wanted to measure how long precompilation takes, and I found out that this takes 4x longer on 1.9 😭 roughly 40 seconds vs 10 seconds.

Pluto main branch

To get these results, you need to make sure that Pluto is the only package that gets precompiled by Pkg.

  1. Remove PlutoSliderServer from your environment
  2. Run Pkg.precompile()
  3. ] dev Pluto and add a space somewhere in the code.

The results below are on Pluto#main, the results on this PR are very similar.

PLUTO_TEST_ONLY_COMPILETIMES=true julia18 -e "import Pkg; @time Pkg.precompile(); Pkg.test(\"Pluto\")"

Julia 1.8.3

➜  ~ PLUTO_TEST_ONLY_COMPILETIMES=true julia18 -e "import Pkg; @time Pkg.precompile(); Pkg.test(\"Pluto\")"
Precompiling project...
  1 dependency successfully precompiled in 10 seconds. 166 already precompiled.
 10.378297 seconds (2.19 M allocations: 174.977 MiB, 0.27% gc time, 0.65% compilation time)
 
 ────────────────────────────────────────────────────────────────────────
                                              Time          Allocations  
                                        ───────────────   ───────────────
             Total measured:                 24.3s            2.16GiB    

 Section                        ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────
 import Pluto                        1    1.17s    6.6%    255MiB   11.7%
 compiletimes.jl                     1    16.5s   93.4%   1.88GiB   88.3%
   PlutoRunner.run_expression        1    798ms    4.5%   66.8MiB    3.1%
   SessionActions.open               1    9.67s   54.8%    951MiB   43.6%
   Pluto.run                         1    1.84s   10.5%    258MiB   11.8%
 ────────────────────────────────────────────────────────────────────────
     Testing Pluto tests passed 

Julia 1.9.0-beta3

➜  ~ PLUTO_TEST_ONLY_COMPILETIMES=true julia19 -e "import Pkg; @time Pkg.precompile(); Pkg.test(\"Pluto\")"
Precompiling environment...
  1 dependency successfully precompiled in 39 seconds. 35 already precompiled.
 39.883693 seconds (1.68 M allocations: 137.289 MiB, 0.50% gc time, 0.25% compilation time)

 ────────────────────────────────────────────────────────────────────────
                                              Time          Allocations  
                                        ───────────────   ───────────────
             Total measured:                 18.6s            1.70GiB    

 Section                        ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────
 import Pluto                        1    1.53s   12.3%    127MiB    7.4%
 compiletimes.jl                     1    10.9s   87.7%   1.56GiB   92.6%
   PlutoRunner.run_expression        1    412ms    3.3%   38.9MiB    2.3%
   SessionActions.open               1    4.41s   35.4%    750MiB   43.4%
   Pluto.run                         1    1.39s   11.2%    223MiB   12.9%
 ────────────────────────────────────────────────────────────────────────
     Testing Pluto tests passed 

@fonsp
Copy link
Owner

fonsp commented Jan 19, 2023

I removed the precompile.jl file, and Julia 1.9 is still significantly slower than Julia 1.8:

Julia 1.8.3, no precompile help at all

➜  ~ PLUTO_TEST_ONLY_COMPILETIMES=true julia18 -e "import Pkg; @time Pkg.precompile(); Pkg.test(\"Pluto\")"
Precompiling project...
  1 dependency successfully precompiled in 4 seconds. 166 already precompiled.
  4.724134 seconds (2.19 M allocations: 173.663 MiB, 0.67% gc time, 1.01% compilation time)

 ────────────────────────────────────────────────────────────────────────
                                              Time          Allocations  
                                        ───────────────   ───────────────
             Total measured:                 25.5s            3.59GiB    

 Section                        ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────
 import Pluto                        1    413ms    2.2%    123MiB    3.4%
 compiletimes.jl                     1    18.4s   97.8%   3.40GiB   96.6%
   PlutoRunner.run_expression        1    802ms    4.3%   93.0MiB    2.6%
   SessionActions.open               1    11.2s   59.4%   2.13GiB   60.4%
   Pluto.run                         1    2.05s   10.9%    445MiB   12.3%
 ────────────────────────────────────────────────────────────────────────
     Testing Pluto tests passed 

Julia 1.9.0-beta3, no precompile help at all

➜  ~ PLUTO_TEST_ONLY_COMPILETIMES=true julia19 -e "import Pkg; @time Pkg.precompile(); Pkg.test(\"Pluto\")"
Precompiling environment...
  1 dependency successfully precompiled in 8 seconds. 35 already precompiled.
  8.530104 seconds (1.67 M allocations: 137.464 MiB, 1.86% gc time, 1.18% compilation time)
  
 ────────────────────────────────────────────────────────────────────────
                                              Time          Allocations  
                                        ───────────────   ───────────────
             Total measured:                 27.0s            2.78GiB    

 Section                        ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────
 import Pluto                        1    910ms    4.4%   37.9MiB    1.3%
 compiletimes.jl                     1    19.9s   95.6%   2.72GiB   98.7%
   PlutoRunner.run_expression        1    540ms    2.6%   57.9MiB    2.1%
   SessionActions.open               1    11.4s   55.0%   1.64GiB   59.4%
   Pluto.run                         1    2.29s   11.0%    376MiB   13.3%
 ────────────────────────────────────────────────────────────────────────
     Testing Pluto tests passed 

@fonsp
Copy link
Owner

fonsp commented Jan 19, 2023

Should we open an issue at Julia? Before 1.9 gets released 😬

@fonsp
Copy link
Owner

fonsp commented Jan 19, 2023

I just tested merging #1881 into this PR, but it has no effect on these benchmarks. :(

@rikhuijzer
Copy link
Collaborator Author

but it has no effect on these benchmarks. :(

That makes sense. These benchmarks do not test how responsive the notebook process (PlutoRunner) is. What happens if you manually compare things like show_richest (#2441 (comment))?

@fonsp fonsp merged commit 61951fa into main Jan 20, 2023
@fonsp fonsp deleted the rh/snoopprecompile branch January 20, 2023 11:55
@fonsp
Copy link
Owner

fonsp commented Jan 20, 2023

@fonsp
Copy link
Owner

fonsp commented Jan 20, 2023

That makes sense. These benchmarks do not test how responsive the notebook process (PlutoRunner) is. What happens if you manually compare things like show_richest (#2441 (comment))?

Exactly, I think we should add some things to the benchmarks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants