Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeQL version 2.18.2 doubles the amount of time spent compiling CodeQL databases #17489

Open
flowerhack opened this issue Sep 16, 2024 · 7 comments
Labels
question Further information is requested

Comments

@flowerhack
Copy link

Hi there,

I'm a committer for the Chromium project & we've been experimenting with building CodeQL databases of Chromium.

Recently, we upgraded the version of CodeQL we were using to compile our databases from 2.18.1 to 2.18.2, so we could include some fixes y'all have recently merged for bugs Chromium ran into (thank you!).

However, we noticed a huge spike in the time it takes to build a CodeQL database after making this switch—it increased the build time by ~2x (and in particular some of our builders started timing out).

We can increase builder timeouts if need be, but we'd like to understand if there's some more fundamental problem here (and if so, is it specific to Chromium or has it been seen on other projects, etc), or if that sharp of an increase in build time is considered a bug.

If it's useful for your analysis, you can view our builder here: https://ci.chromium.org/ui/p/chromium/builders/ci/linux-codeql-generator. The purple failing builds started occuring around 2331, which is when we made the version switch, and started working again at 2337 when we switched the version back.

You'll notice, if you expand out the steps, that the time on the index_target.py step is what is particularly taking longer. That step is primarily a wrapper around the commands to compile the CodeQL database; you can view the script here in case that's useful.

I see that 2.18.3 is now available; I may try to update with that version number, in case this was a known bug that was fixed in the very next version, but I'm reporting this now on the assumption that that's not the case.

Thanks for any help you can offer!

@flowerhack flowerhack added the question Further information is requested label Sep 16, 2024
@jketema
Copy link
Contributor

jketema commented Sep 16, 2024

Hi,

Thanks for the above. It's not immediately clear to me what is taking more time, as it's not obvious for me where I can find any of the console output. Is it the tracing step or the finalization step that is taking more time, or both?

Note that with both 2.18.2 and 2.18.3 you're likely running into this issue: #17332. We fixed one possible cause of this (the fix for which will land in 2.19.0; to be released either this week or next), but we have not yet been able to confirm that the fix works for Chromium.

@flowerhack
Copy link
Author

There isn't any output available for the purple builds, unfortunately, since they timed out before they had a chance to upload their logs.

However, we did have one build run to completion while running CodeQL version 2.18.2, which I think will be helpful for debugging. You can view those logs by going to https://ci.chromium.org/ui/p/chromium/builders/ci/linux-codeql-generator/2332/overview, unchecking "Hide Succeeded Steps", and click "CodeQL databases with logs" next to step 19, "gsutil upload (2)". (You will need some kind of Google login to download, but any Google account should do, e.g. a gmail or similar. Let me know if you have any issues downloading the logs and we can come up with something else.)

At a cursory glance, the logs do indicate that the "TRAP import" step (which I believe is the lion's share of the finalization step?) takes ~30min longer on 2.18.2 than 2.18.1, which is significant, but I don't think explains most of the difference, since an extra 30min wouldn't lead to a doubling of the total time.

@jketema
Copy link
Contributor

jketema commented Sep 17, 2024

Thanks for the link.

At a cursory glance, the logs do indicate that the "TRAP import" step (which I believe is the lion's share of the finalization step?)

That's correct.

takes ~30min longer on 2.18.2 than 2.18.1, which is significant, but I don't think explains most of the difference, since an extra 30min wouldn't lead to a doubling of the total time.

Agreed.

In general we expect a traced build to be 4x~5x slower than a regular build. From what you write it seems to be more than that, is that correct? One of the main reasons this might happen is memory pressure. The extractor can consume quite a lot of memory in certain cases (which is not something we can easily fix), which can lead to swapping. I'm not sure how much insight you have in memory usage on your build servers, but this is probably the first thing we should check. If it is a memory pressure issue, then the general advice is to either (a) reduce the parallelism of the build, or (b) switch to a machine with more memory.

@smowton
Copy link
Contributor

smowton commented Sep 17, 2024

Had a prod at this out of curiosity -- it appears the actual databases have roughly doubled in size:

Build 2324 has

Running command in /b/s/w/ir/x/w/src: [/b/s/w/ir/x/w/src/third_party/ninja/ninja, -C, /b/s/w/ir/x/w/src/out/release, d8] ...
Finalizing codeql db.
Running TRAP import for CodeQL database at /b/s/w/ir/x/w/rc/codeql_dbsw08p5wmb/d8...
Grouping TRAP files by link target
Scanning TRAP files
Assembling TRAP files
Importing TRAP files
Merging relations
Finished writing database (relations: 2.03 GiB; string pool: 304.03 MiB).

Build 2332 has

Running command in /b/s/w/ir/x/w/src: [/b/s/w/ir/x/w/src/third_party/ninja/ninja, -C, /b/s/w/ir/x/w/src/out/release, d8] ...
Finalizing codeql db.
Running pre-finalize script /b/s/w/ir/x/w/codeql/cpp/tools/pre-finalize.sh in /b/s/w/ir/x/w/src.
Running command in /b/s/w/ir/x/w/src: [/b/s/w/ir/x/w/codeql/cpp/tools/pre-finalize.sh]
Running TRAP import for CodeQL database at /b/s/w/ir/x/w/rc/codeql_dbsho8v9azo/d8...
Grouping TRAP files by link target
Grouping unlinked TRAP files together
Scanning TRAP files
Importing TRAP files
Merging relations
Finished writing database (relations: 5.77 GiB; string pool: 304.03 MiB).

I note the surprising new appearance of the pre-finalize step, though this doesn't seem to have been changed recently and for C/C++ only exists for TRAP-caching purposes, which I believe is still beta and off by default? But crucially the db relations have ballooned from 2.03GiB to 5.77 GiB.

@jketema
Copy link
Contributor

jketema commented Sep 17, 2024

But crucially the db relations have ballooned from 2.03GiB to 5.77 GiB.

This is within the range of what I expect given the extractor issues that were fixed, which had quite some impact (especially since some were outright extractor crashes). However, it is potentially larger than it should be due to the issue underlying #17332.

@flowerhack
Copy link
Author

In general we expect a traced build to be 4x~5x slower than a regular build. From what you write it seems to be more than that, is that correct?

We have about a 2x increase in build time between CodeQL 2.18.1 vs 2.18.2. However, that was just the version change. I do not believe we switched from a "regular" to a traced build with the version change. (To make sure I understand correctly: by "traced build", you mean using codeql database trace-command, correct? That command is what we have always used.)

the general advice is to either (a) reduce the parallelism of the build, or (b) switch to a machine with more memory.

The machine this job was run on is 32 cores with 256GB of memory. I can look into whether machines with more resources are available, but I'm curious how this compares to (1) machines used by projects of a similar size that you've seen (if any), and (2) if the build time is still surprising given those specs (or if that feels about right)?

@jketema
Copy link
Contributor

jketema commented Sep 17, 2024

In general we expect a traced build to be 4x~5x slower than a regular build. From what you write it seems to be more than that, is that correct?

We have about a 2x increase in build time between CodeQL 2.18.1 vs 2.18.2. However, that was just the version change. I do not believe we switched from a "regular" to a traced build with the version change. (To make sure I understand correctly: by "traced build", you mean using codeql database trace-command, correct? That command is what we have always used.)

That's indeed what I mean by a traced build. However, apologies for not being clear enough. What I'm interested in knowing is how the traced build time compares with the build time of your normal builds (using comparable machines). I would like to establish if the time of the traced build even with the newer version of CodeQL still within the bounds of what we think is acceptable.

the general advice is to either (a) reduce the parallelism of the build, or (b) switch to a machine with more memory.

The machine this job was run on is 32 cores with 256GB of memory. I can look into whether machines with more resources are available, but I'm curious how this compares to (1) machines used by projects of a similar size that you've seen (if any), and (2) if the build time is still surprising given those specs (or if that feels about right)?

Before switching machines, it would really make sense to try to understand whether this is a memory pressure issue or whether there's something else going on. If the latter, switching machines is likely not going to help. To answer your questions:

  1. There are really no comparable projects except for ones that embed Chromium in one way or another, and I do not have number for those. Generally what is done on codebases of comparable size is to create a separate database per executable, but that's not something that really seems to apply here.
  2. This is what I'm trying to establish above, but for that I need to know how long a normal build takes on a machine with similar specs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants