Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package CI performance regression #53657

Closed
LilithHafner opened this issue Mar 8, 2024 · 20 comments
Closed

Package CI performance regression #53657

LilithHafner opened this issue Mar 8, 2024 · 20 comments
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@LilithHafner
Copy link
Member

LilithHafner commented Mar 8, 2024

This NFC PR to Chairmarks apparently made CI on nightly substantially (>2x) slower. I suspect the regression is due to a change in Julia.

Fast (https://github.com/LilithHafner/Chairmarks.jl/actions/runs/8147371039/job/22267929268)

Julia Version 1.12.0-DEV.124
Commit 0311aa4caf (2024-03-04 16:34 UTC)

Slow (https://github.com/LilithHafner/Chairmarks.jl/actions/runs/8174911720/job/22350814940)

Julia Version 1.12.0-DEV.138
Commit 90d84d4ab00 (2024-03-06 00:32 UTC)

Diff
https://github.com/JuliaLang/julia/compare/0311aa4caf..90d84d4ab00

See also: #53439

@LilithHafner LilithHafner added performance Must go faster regression Regression in behavior compared to a previous version labels Mar 8, 2024
@Zentrik
Copy link
Member

Zentrik commented Mar 9, 2024

Bisected to fb71a5d

Bisect Script
using Downloads, CodecZlib, Tar, LibGit2

start_sha = "0311aa4caf461d05fb408de67dafc582e2ada9a0"
end_sha = "90d84d4ab00a34cd9c75b06b77925aaf8afbabfd"

bisect_command = raw"""
using Pkg, Test
Pkg.add("Chairmarks")
t = @timed try
    Pkg.test("Chairmarks")
catch
end
t.time
"""

function run_commit(file, commit, os, arch)
    version_full = read(`contrib/commit-name.sh $commit`, String)
    major_minor_version = split(version_full, '.')[1:2] |> x->join(x, '.')

    binary_url = "https://julialangnightlies-s3.julialang.org/bin/$os/$arch/$major_minor_version/julia-$(commit[1:10])-$os-$arch.tar.gz"
    download_path = Downloads.download(binary_url)

    result = mktempdir() do binary_dir
        # extract
        open(download_path) do io
            stream = GzipDecompressorStream(io)
            Tar.extract(stream, binary_dir)
        end

        read(Cmd(
            `$binary_dir/julia-$(commit[1:10])/bin/julia --project=$binary_dir -E include\(\"$(file[1])\"\)`,
            ignorestatus=true
            ), String
        )
    end

    parse(Float64, result)
end

function bisect_perf(bisect_command, start_sha, end_sha, factor=1.5, os="linux", arch="x86_64")
    dir = mktempdir()
    cd(dir)
    repo = LibGit2.clone("https://github.com/JuliaLang/julia.git", dir)

    # Get the start and end commit objects
    start_commit = LibGit2.GitCommit(repo, start_sha)
    end_commit = LibGit2.GitCommit(repo, end_sha)

    # Get the commit range
    commit_range::Vector{String} = LibGit2.with(LibGit2.GitRevWalker(repo)) do walker
        LibGit2.map((oid, repo)->string(oid), walker; range=string(LibGit2.GitHash(start_commit))*".."*string(LibGit2.GitHash(end_commit)), by=LibGit2.Consts.SORT_REVERSE)
    end
    pushfirst!(commit_range, start_sha)

    # Test script, makes it easy to run bisect command
    file = mktemp()
    write(file[1], bisect_command)

    original_time = run_commit(file, start_sha, os, arch)
    println("Starting commit took $original_time")

    left = 1
    right = length(commit_range)
    i = 0
    while left < right
        i = (left + right) ÷ 2
        commit = commit_range[i]

        result = run_commit(file, commit, os, arch)
        
        println("Commit $commit " * (result <= factor * original_time ? "succeeded" : "failed") * " in $result")

        if result <= factor * original_time
            left = i + 1
        else
            right = i
        end
    end

    return left, commit_range[left]
end

bisect_perf(bisect_command, start_sha, end_sha)

@LilithHafner
Copy link
Member Author

Thanks for the bisection! I don't suppose I could tempt you into figuring out which of the PRs to Pkg is responsible? There are only 4 candidates (listed here) and I suspect that JuliaLang/Pkg.jl#3820 is the culprit but it would be nice to know for sure.

@IanButterworth
Copy link
Sponsor Member

What is the latest master version tested here? There's been some instability

@LilithHafner
Copy link
Member Author

As of today (LilithHafner/Chairmarks.jl#77)

Julia Version 1.12.0-DEV.154
Commit 53048b2ab1e (2024-03-09 10:44 UTC)

Chairmarks still takes about 2x longer to test on nightly than on 1.10.

@IanButterworth
Copy link
Sponsor Member

Would you mind looking into the log line times to see where the main regressions are? Currently I'm looking into Base precompile sysimage/pkgimage coverage regression

@KristofferC
Copy link
Sponsor Member

KristofferC commented Mar 9, 2024

Pkg is not loaded into the process doing the testing and

Test Summary: |   Pass  Broken   Total     Time
Chairmarks    | 632151       7  632158  2m10.5s

shows that it is the testing itself that is slower?

So the only impact I can see from Pkg directly is the precompile files it generates. And the commits in that list touch that. So maybe it has something to do with that. Or something completely different.

@LilithHafner
Copy link
Member Author

In the OP

  • julia-buildpkg went from 5s to 42s
  • cache went from 5s to 17s
  • julia-processcoverage went from 4s to 47s
  • julia-runtest went from 100s to 187s (+87s)
  • testing itself (a subset of julia-runtest) went from 89s to 131s (+42s)

It looks like a combination of most parts of the pipeline getting slower.

@Zentrik
Copy link
Member

Zentrik commented Mar 9, 2024

I only bisected the testing itself, so it's possible that the other slowdowns are from some other commit.

I don't suppose I could tempt you into figuring out which of the PRs to Pkg is responsible?

I think to test the different commits I'm supposed to just do add https://github.com/JuliaLang/Pkg.jl/commit/e7d740ac82fa4c289bb58f9acb6e60eebb6d8a8e.
If that's true, the timings I was getting were far too noisy to work out what was going on.

@LilithHafner
Copy link
Member Author

To be honest, I don't know what the best way to bisect this within Pkg.jl is. Rebuilding Julia from source using a different Pkg version would probably do it, but I'm guessing that's a bit involved. It might be best to hand if over to @KristofferC who authored all 4 of those commits.

@giordano
Copy link
Contributor

giordano commented Mar 9, 2024

I don't know what the best way to bisect this within Pkg.jl is.

Did you try to follow: https://github.com/JuliaLang/Pkg.jl/blob/e0821116e9f4364a2b37a77183e93d055a111f4d/README.md#using-the-development-version-of-pkgjl?

@IanButterworth
Copy link
Sponsor Member

You don't need to change the UUID anymore, just set --project to the Pkg repo on startup

@KristofferC
Copy link
Sponsor Member

KristofferC commented Mar 11, 2024

The only thing that seemed to have gotten slower here is the "Persistent task" test in Aqua.jl https://github.com/JuliaTesting/Aqua.jl/blob/b51f80035166f716b12a896587a84c89601c86dc/src/persistent_tasks.jl#L2 where it takes 40s on fb71a5d vs 3.3s on 427da5c.

Specifically, the Pkg.precompile in https://github.com/JuliaTesting/Aqua.jl/blob/b51f80035166f716b12a896587a84c89601c86dc/src/persistent_tasks.jl#L103 went from

  1.895197 seconds (3.28 M allocations: 201.884 MiB, 6.62% gc time, 66.50% compilation time: 6% of which was recompilation)

to

 18.610014 seconds (56.49 M allocations: 2.832 GiB, 2.69% gc time, 96.87% compilation time: 1% of which was recompilation)

So from what I can see, the only difference is:

~/JuliaTests/TaskProj
❯ ~/juliaslow/julia  --startup-file=no --project -e 'using Pkg; @time Pkg.precompile()'
  0.579358 seconds (1.93 M allocations: 130.249 MiB, 19.62% gc time, 61.13% compilation time: 10% of which was recompilation)

~/JuliaTests/TaskProj
❯ ~/juliafast/julia  --startup-file=no --project -e 'using Pkg; @time Pkg.precompile()'         
 13.331489 seconds (8.72 M allocations: 461.582 MiB, 0.37% gc time, 98.81% compilation time: <1% of which was recompilation)

which says to me we have to precompile the Base.precompilepkgs function (which I think @IanButterworth started doing in #53682)..

@KristofferC
Copy link
Sponsor Member

It might be best to hand if over to @KristofferC who authored all 4 of those commits.

I don't think that is the best attitude towards someone wanting to learn how to do something. My analysis above required no advanced techniques, it was just basic stuff like adding @time and going down deeper into the functions.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Mar 11, 2024

The Aqua slowdown is already fixed on master by JuliaLang/Pkg.jl#3835.

Removing the Aqua test I get on master

Test Summary: |   Pass  Broken   Total   Time
Chairmarks    | 604534       5  604539  29.5s
     Testing Chairmarks tests passed 

and on 427da5c

Test Summary: |   Pass  Broken   Total   Time
Chairmarks    | 535213       5  535218  29.6s
     Testing Chairmarks tests passed 

so the change to Pkg did not affect this so I am "handing this back" now.

@IanButterworth
Copy link
Sponsor Member

#53657 (comment) stated the issue remained on 53048b2 which is after JuliaLang/Pkg.jl#3835 was included in master (in a998082)

Maybe there's a more general 1.12 slowdown though, as that comment compared to 1.10, not earlier 1.12 runs

@KristofferC
Copy link
Sponsor Member

Locally, for @time Pkg.test() I get a total time of 46s on master vs 44s on 427da5c and 39s on 1.10.

@LilithHafner
Copy link
Member Author

I'm guessing this regression has something to do with precomilation or caching as it is present in CI but not locally.

@IanButterworth
Copy link
Sponsor Member

How about local builds vs. juliaup's nightly build? (wondering if it's JULIA_CPU_TARGET not being set properly)

@KristofferC
Copy link
Sponsor Member

I'm guessing this regression has something to do with precomilation or caching as it is present in CI but not locally.

Probably not since older cache files are invalid for new julia versions.

@KristofferC
Copy link
Sponsor Member

I think the cause of the major issue was the missing Pkg precompilation, the rest is likely a dup of "general" slowdowns as reported in #53511 and #53570 so I am not sure another issue on that is worth having.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

5 participants