-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICE compiling rustc-serialize under sccache 'failed to acquire jobserver token' #42867
Comments
On the (linux-hosted) macOS build, the same panic happens building the simd crate. It must be something to with the jobserver rather than the specific input code. |
related to rust-lang/cargo#1744 ? |
The "failed to acquire jobserver token" message only appears in cargo, and comes from that issue I just mentioned. |
The jobserver moved into the compiler's parallel support recently. See #42682. |
Err, I had failed to update my rust tree properly, which is why I still was seeing the old code. |
Hm sounds bad! @rillian do you know if there's a way to reproduce this short of "check out gecko and build everything"? I'm trying some local smaller test cases but can't seem to get the same assertion to show up. |
It's even harder than that. I've only seen it in the automation builds, so we need something from that environment. I couldn't reproduce with a local gecko build. Probably the best a approach is to use a taskcluster image. I can investigate further, but it will take a few days. |
Hm ok, I'll try expanding this a bit. Yes jobserver support was recently added to both rustc and Cargo with the intention of being able to create their own and integrate with foreign jobservers. Both rustc and Cargo have the same support code, a Jobserver inheritance is specified through the So what's most likely to be happening here is that Some questions that may help narrow this down:
|
sccache has a client-server model, and it's the server that does actual compilations, invoking rustc. The problem might be that the server and the client are not running in the same make environment or something related to that. In any case, cargo/rustc should probably avoid failing in that case... Cc: @luser |
I wasn't able to reproduce with sccache disabled. |
@rillian do you know how sccache is started in the build system? Is the server started manually way near the beginning? Additionally is this perhaps invoked by |
I /think/ the sccache server starts during the |
@luser would likely know better. |
Afaict we rely on the first sccache invocation to start it. There is code to explicitly shut down any running server before we start and after we finish. I don't see any spurious invocations of Is it possible traversing a Our builds are say they're using sccache 9155425cfc038d6a60deb50816055f4e93b93ad1, but I don't see that commit in the history. Maybe it's a fork. |
Yes, our scache build is from luser/sccache@9155425. |
Hm ok I'm still not sure how this can happen... It sounds like |
I never got time to investigate, but I can still reproduce with today's 1.20.0-nightly. |
So, there are several things going on, but I'll just start with the one that's really relevant to cargo and rust:
For all these reasons above, when I tried this morning with 1.20.0-nightly on a recent tree, it didn't fail, because luck. And as mentioned to @alexcrichton on irc, cargo also didn't actually use the job server, for the same reasons. |
FWIW, other than those issues, the jobserver client support in cargo and rustc seems to work. |
Hm so I'm still confused how rustc/cargo are using a broken jobserver. There should be enough guards in place all along the way to prevent usage of a misconfigured or broken jobserver, it's expected that we get The error above is EBADF returned from a call to |
EBADF doesn't necessarily mean the fd is closed. It means it's not a valid fd for reading. Which means it could be a valid write end of a pipe. As a matter of fact, I've seen that happen in sccache itself, but sccache doesn't try to use the fds from MAKEFLAGS, so that wasn't a problem: the process had MAKEFLAGS=--jobserver-fds=5,6, and per /proc/pid/fd, 6 and 7 were end points of the same pipe (which I presume, come from a rust channel). Anyways, here's the likely scenario:
That's a simplified scenario. The real-world problem that this issue was opened for goes one level deeper, with rustc. The jobserver code does check that the fds are end points of pipes, but the fact is, by the time the fds are checked, they might well be. Disclaimer: I haven't actually seen what fds were open in a rustc process, but this seems like a reasonable explanation. The safest way IMHO, that the jobserver code can be initialized is from a lazy_static (which doesn't even guarantee it would be first, but at least, the chances of something else opening fds are slim) |
Ah I didn't actually know you could get EBADF for the wrong end of a pipe! I guess that makes sense though, and sounds quite plausible. I'm still having difficulty constructing how this is possible to happen though. In Rust all files are atomically opened with CLOEXEC so nothing from sccache should leak into spawned processes of sccache. Additionally practically the first thing Cargo does is call I wonder if fds are being leaked into It does sound like regardless that the something crate needs to be even more resilient to to a "misconfigured" environment. I'm not really sure what this would look like in terms of a reasonable implementation though... I wonder if |
There are, but I think they are red herrings... as a matter of fact, when the fds from make are passed down (and still other fds leaked), as noted in #42867 (comment) , this actually works properly. |
I actually double checked last comment by taking the same Firefox tree from @rillian's try push, and applying bug 1367940 on top of it. https://treeherder.mozilla.org/#/jobs?repo=try&revision=b7b58c097d8c0b4340ee88c982c663655b51afcb |
I'm running into this same failure (ICE with 'failed to acquire jobserver token') when doing a standalone webrender build (no gecko/mozilla-central involved) with sccache. I don't get the error without sccache. The machine it's running on is also pretty minimal - it just has the bare minimum of things needed to build webrender. Latest stable rust/cargo (1.20.0) and latest sccache (0.2.1). The build is run on the machine through taskcluster integration, but the machine is in a server farm we rented from macstadium and that we're managing manually. You can see the full taskcluster log at https://tools.taskcluster.net/groups/BqgBtyCvQUqgDaQAJglrUA/tasks/Q8UMqP9TR7-va2ztvs-lSw/runs/0/logs/public%2Flogs%2Flive.log It would be great to get this resolved as we want to use this setup for webrender CI and being able to use sccache means we'll get much faster turnaround. |
@staktrace hm that's a very curious error... The OS error there is "Resource temporarily unavailable" with code 35 which seems to correspond to EDEADLK. Apparently calling Are you running |
@alexcrichton It is getting spawned naturally via the first invocation. |
I... still have no idea how to explain this. Using sccache sort of fundamentally can't work with how jobservers are designed, but rustc should have a check to prevent it from using bad fds. This means that somehow bad fds are leaking in, and I have no idea how to explain that. In any case none of this is really supposed to work anyway (sccache working with jobservers) so I think we need to either:
I'm personally a little in favor of (2) |
I haven't looked into this at all yet, but wanted to mention I'm getting similar errors when trying to enable this on Servo CI. |
This commit alters the main sccache server to operate and orchestrate its own GNU make style jobserver. This is primarily intended for interoperation with rustc itself. The Rust compiler currently has a multithreaded mode where it will execute code generation and optimization on the LLVM side of things in parallel. This parallelism, however, can overload a machine quickly if not properly accounted for (e.g. if 10 rustcs all spawn 10 threads...). The usage of a GNU make style jobserver is intended to arbitrate and rate limit all these rustc instances to ensure that one build's maximal parallelism never exceeds a particular amount. Currently for Rust Cargo is the primary driver for setting up a jobserver. Cargo will create this and manage this per compilation, ensuring that any one `cargo build` invocation never exceeds a maximal parallelism. When sccache enters the picture, however, the story gets slightly more odd. The jobserver implementation on Unix relies on inheritance of file descriptors in spawned processes. With sccache, however, there's no inheritance as the actual rustc invocation is spawned by the server, not the client. In this case the env vars used to configure the jobsever are usually incorrect. To handle this problem this commit bakes a jobserver directly into sccache itself. The jobserver then overrides whatever jobserver the client has configured in its own env vars to ensure correct operation. The settings of each jobserver may be misconfigured (there's no way to configure sccache's jobserver right now), but hopefully that's not too much of a problem for the forseeable future. The implementation here was to provide a thin wrapper around the `jobserver` crate with a futures-based interface. This interface was then hooked into the mock command infrastructure to automatically acquire a jobserver token when spawning a process and automatically drop the token when the process exits. Additionally, all spawned processes will now automatically receive a configured jobserver. cc rust-lang/rust#42867, the original motivation for this commit
I've opened mozilla/sccache#185 with my personal preferred strategy of adding a jobserver to sccache. |
This commit alters the main sccache server to operate and orchestrate its own GNU make style jobserver. This is primarily intended for interoperation with rustc itself. The Rust compiler currently has a multithreaded mode where it will execute code generation and optimization on the LLVM side of things in parallel. This parallelism, however, can overload a machine quickly if not properly accounted for (e.g. if 10 rustcs all spawn 10 threads...). The usage of a GNU make style jobserver is intended to arbitrate and rate limit all these rustc instances to ensure that one build's maximal parallelism never exceeds a particular amount. Currently for Rust Cargo is the primary driver for setting up a jobserver. Cargo will create this and manage this per compilation, ensuring that any one `cargo build` invocation never exceeds a maximal parallelism. When sccache enters the picture, however, the story gets slightly more odd. The jobserver implementation on Unix relies on inheritance of file descriptors in spawned processes. With sccache, however, there's no inheritance as the actual rustc invocation is spawned by the server, not the client. In this case the env vars used to configure the jobsever are usually incorrect. To handle this problem this commit bakes a jobserver directly into sccache itself. The jobserver then overrides whatever jobserver the client has configured in its own env vars to ensure correct operation. The settings of each jobserver may be misconfigured (there's no way to configure sccache's jobserver right now), but hopefully that's not too much of a problem for the forseeable future. The implementation here was to provide a thin wrapper around the `jobserver` crate with a futures-based interface. This interface was then hooked into the mock command infrastructure to automatically acquire a jobserver token when spawning a process and automatically drop the token when the process exits. Additionally, all spawned processes will now automatically receive a configured jobserver. cc rust-lang/rust#42867, the original motivation for this commit
Make sets its jobserver pipe file descriptors to non-blocking mode when it's configured to use |
Yeah, the |
@whitslack excellent point! I've got a fix for your and @luser's issue at rust-lang/jobserver-rs#2 |
This commit alters the main sccache server to operate and orchestrate its own GNU make style jobserver. This is primarily intended for interoperation with rustc itself. The Rust compiler currently has a multithreaded mode where it will execute code generation and optimization on the LLVM side of things in parallel. This parallelism, however, can overload a machine quickly if not properly accounted for (e.g. if 10 rustcs all spawn 10 threads...). The usage of a GNU make style jobserver is intended to arbitrate and rate limit all these rustc instances to ensure that one build's maximal parallelism never exceeds a particular amount. Currently for Rust Cargo is the primary driver for setting up a jobserver. Cargo will create this and manage this per compilation, ensuring that any one `cargo build` invocation never exceeds a maximal parallelism. When sccache enters the picture, however, the story gets slightly more odd. The jobserver implementation on Unix relies on inheritance of file descriptors in spawned processes. With sccache, however, there's no inheritance as the actual rustc invocation is spawned by the server, not the client. In this case the env vars used to configure the jobsever are usually incorrect. To handle this problem this commit bakes a jobserver directly into sccache itself. The jobserver then overrides whatever jobserver the client has configured in its own env vars to ensure correct operation. The settings of each jobserver may be misconfigured (there's no way to configure sccache's jobserver right now), but hopefully that's not too much of a problem for the forseeable future. The implementation here was to provide a thin wrapper around the `jobserver` crate with a futures-based interface. This interface was then hooked into the mock command infrastructure to automatically acquire a jobserver token when spawning a process and automatically drop the token when the process exits. Additionally, all spawned processes will now automatically receive a configured jobserver. cc rust-lang/rust#42867, the original motivation for this commit
This commit alters the main sccache server to operate and orchestrate its own GNU make style jobserver. This is primarily intended for interoperation with rustc itself. The Rust compiler currently has a multithreaded mode where it will execute code generation and optimization on the LLVM side of things in parallel. This parallelism, however, can overload a machine quickly if not properly accounted for (e.g. if 10 rustcs all spawn 10 threads...). The usage of a GNU make style jobserver is intended to arbitrate and rate limit all these rustc instances to ensure that one build's maximal parallelism never exceeds a particular amount. Currently for Rust Cargo is the primary driver for setting up a jobserver. Cargo will create this and manage this per compilation, ensuring that any one `cargo build` invocation never exceeds a maximal parallelism. When sccache enters the picture, however, the story gets slightly more odd. The jobserver implementation on Unix relies on inheritance of file descriptors in spawned processes. With sccache, however, there's no inheritance as the actual rustc invocation is spawned by the server, not the client. In this case the env vars used to configure the jobsever are usually incorrect. To handle this problem this commit bakes a jobserver directly into sccache itself. The jobserver then overrides whatever jobserver the client has configured in its own env vars to ensure correct operation. The settings of each jobserver may be misconfigured (there's no way to configure sccache's jobserver right now), but hopefully that's not too much of a problem for the forseeable future. The implementation here was to provide a thin wrapper around the `jobserver` crate with a futures-based interface. This interface was then hooked into the mock command infrastructure to automatically acquire a jobserver token when spawning a process and automatically drop the token when the process exits. Additionally, all spawned processes will now automatically receive a configured jobserver. cc rust-lang/rust#42867, the original motivation for this commit
Any update on this? |
@luser want to take a look at mozilla/sccache#185 and see if you agree with it? |
This commit alters the main sccache server to operate and orchestrate its own GNU make style jobserver. This is primarily intended for interoperation with rustc itself. The Rust compiler currently has a multithreaded mode where it will execute code generation and optimization on the LLVM side of things in parallel. This parallelism, however, can overload a machine quickly if not properly accounted for (e.g. if 10 rustcs all spawn 10 threads...). The usage of a GNU make style jobserver is intended to arbitrate and rate limit all these rustc instances to ensure that one build's maximal parallelism never exceeds a particular amount. Currently for Rust Cargo is the primary driver for setting up a jobserver. Cargo will create this and manage this per compilation, ensuring that any one `cargo build` invocation never exceeds a maximal parallelism. When sccache enters the picture, however, the story gets slightly more odd. The jobserver implementation on Unix relies on inheritance of file descriptors in spawned processes. With sccache, however, there's no inheritance as the actual rustc invocation is spawned by the server, not the client. In this case the env vars used to configure the jobsever are usually incorrect. To handle this problem this commit bakes a jobserver directly into sccache itself. The jobserver then overrides whatever jobserver the client has configured in its own env vars to ensure correct operation. The settings of each jobserver may be misconfigured (there's no way to configure sccache's jobserver right now), but hopefully that's not too much of a problem for the forseeable future. The implementation here was to provide a thin wrapper around the `jobserver` crate with a futures-based interface. This interface was then hooked into the mock command infrastructure to automatically acquire a jobserver token when spawning a process and automatically drop the token when the process exits. Additionally, all spawned processes will now automatically receive a configured jobserver. cc rust-lang/rust#42867, the original motivation for this commit
I finally merged @alexcrichton's change into sccache. If someone wants to test to see if this indeed fixes the issues in Servo CI and elsewhere I'd be happy to tag a new sccache release to get new binaries. |
@luser I'd definitely be interested in trying out the latest in Servo CI! If you cut a new release (or even a pre-release/beta since we're testing) and we can grab some new binaries, we can give it a whirl. |
Re-enable sccache for Linux builds As far as I know, sccache is working properly on the non-cross-compiling Linux builders. For safety, only enable it for the builders that run on PRs, to avoid breaking our nightly generation and scheduled test runs. This will also allow testing new versions of sccache more easily. This implements my suggestion from #19858 (comment), and should also let us handle testing a new sccache: rust-lang/rust#42867 (comment) (our current version of sccache [seems to be 2018-01-09](https://github.com/servo/saltfs/blob/f50214b8fa012e03616ecae1ef2913e6fe9044da/servo-build-dependencies/ci-map.jinja#L5)). <!-- Please describe your changes on the following line: --> --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: --> - [ ] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [ ] These changes fix #__ (github issue number if applicable). <!-- Either: --> - [ ] There are tests for these changes OR - [x] These changes do not require tests because they change the CI configuration <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/19883) <!-- Reviewable:end -->
…usa:reenable-sccache-partially); r=jdm As far as I know, sccache is working properly on the non-cross-compiling Linux builders. For safety, only enable it for the builders that run on PRs, to avoid breaking our nightly generation and scheduled test runs. This will also allow testing new versions of sccache more easily. This implements my suggestion from servo/servo#19858 (comment), and should also let us handle testing a new sccache: rust-lang/rust#42867 (comment) (our current version of sccache [seems to be 2018-01-09](https://github.com/servo/saltfs/blob/f50214b8fa012e03616ecae1ef2913e6fe9044da/servo-build-dependencies/ci-map.jinja#L5)). <!-- Please describe your changes on the following line: --> --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: --> - [ ] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [ ] These changes fix #__ (github issue number if applicable). <!-- Either: --> - [ ] There are tests for these changes OR - [x] These changes do not require tests because they change the CI configuration <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> Source-Repo: https://github.com/servo/servo Source-Revision: 78ffce1cbe5fcce4d057b69c3cbf0cd2bc2b449c --HG-- extra : subtree_source : https%3A//hg.mozilla.org/projects/converted-servo-linear extra : subtree_revision : 04b98e4452ebe655c59d54f42827b6f3c29b0cd9
FWIW my STR from #42867 (comment) seem to be fixed with the latest published sccache (0.2.6) |
Hurray! |
This commit alters the main sccache server to operate and orchestrate its own GNU make style jobserver. This is primarily intended for interoperation with rustc itself. The Rust compiler currently has a multithreaded mode where it will execute code generation and optimization on the LLVM side of things in parallel. This parallelism, however, can overload a machine quickly if not properly accounted for (e.g. if 10 rustcs all spawn 10 threads...). The usage of a GNU make style jobserver is intended to arbitrate and rate limit all these rustc instances to ensure that one build's maximal parallelism never exceeds a particular amount. Currently for Rust Cargo is the primary driver for setting up a jobserver. Cargo will create this and manage this per compilation, ensuring that any one `cargo build` invocation never exceeds a maximal parallelism. When sccache enters the picture, however, the story gets slightly more odd. The jobserver implementation on Unix relies on inheritance of file descriptors in spawned processes. With sccache, however, there's no inheritance as the actual rustc invocation is spawned by the server, not the client. In this case the env vars used to configure the jobsever are usually incorrect. To handle this problem this commit bakes a jobserver directly into sccache itself. The jobserver then overrides whatever jobserver the client has configured in its own env vars to ensure correct operation. The settings of each jobserver may be misconfigured (there's no way to configure sccache's jobserver right now), but hopefully that's not too much of a problem for the forseeable future. The implementation here was to provide a thin wrapper around the `jobserver` crate with a futures-based interface. This interface was then hooked into the mock command infrastructure to automatically acquire a jobserver token when spawning a process and automatically drop the token when the process exits. Additionally, all spawned processes will now automatically receive a configured jobserver. cc rust-lang/rust#42867, the original motivation for this commit
…usa:reenable-sccache-partially); r=jdm As far as I know, sccache is working properly on the non-cross-compiling Linux builders. For safety, only enable it for the builders that run on PRs, to avoid breaking our nightly generation and scheduled test runs. This will also allow testing new versions of sccache more easily. This implements my suggestion from servo/servo#19858 (comment), and should also let us handle testing a new sccache: rust-lang/rust#42867 (comment) (our current version of sccache [seems to be 2018-01-09](https://github.com/servo/saltfs/blob/f50214b8fa012e03616ecae1ef2913e6fe9044da/servo-build-dependencies/ci-map.jinja#L5)). <!-- Please describe your changes on the following line: --> --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: --> - [ ] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [ ] These changes fix #__ (github issue number if applicable). <!-- Either: --> - [ ] There are tests for these changes OR - [x] These changes do not require tests because they change the CI configuration <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> Source-Repo: https://github.com/servo/servo Source-Revision: 78ffce1cbe5fcce4d057b69c3cbf0cd2bc2b449c UltraBlame original commit: d897da50fc624ee27af7c16c02e368b92f4a6f5e
…usa:reenable-sccache-partially); r=jdm As far as I know, sccache is working properly on the non-cross-compiling Linux builders. For safety, only enable it for the builders that run on PRs, to avoid breaking our nightly generation and scheduled test runs. This will also allow testing new versions of sccache more easily. This implements my suggestion from servo/servo#19858 (comment), and should also let us handle testing a new sccache: rust-lang/rust#42867 (comment) (our current version of sccache [seems to be 2018-01-09](https://github.com/servo/saltfs/blob/f50214b8fa012e03616ecae1ef2913e6fe9044da/servo-build-dependencies/ci-map.jinja#L5)). <!-- Please describe your changes on the following line: --> --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: --> - [ ] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [ ] These changes fix #__ (github issue number if applicable). <!-- Either: --> - [ ] There are tests for these changes OR - [x] These changes do not require tests because they change the CI configuration <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> Source-Repo: https://github.com/servo/servo Source-Revision: 78ffce1cbe5fcce4d057b69c3cbf0cd2bc2b449c UltraBlame original commit: d897da50fc624ee27af7c16c02e368b92f4a6f5e
…usa:reenable-sccache-partially); r=jdm As far as I know, sccache is working properly on the non-cross-compiling Linux builders. For safety, only enable it for the builders that run on PRs, to avoid breaking our nightly generation and scheduled test runs. This will also allow testing new versions of sccache more easily. This implements my suggestion from servo/servo#19858 (comment), and should also let us handle testing a new sccache: rust-lang/rust#42867 (comment) (our current version of sccache [seems to be 2018-01-09](https://github.com/servo/saltfs/blob/f50214b8fa012e03616ecae1ef2913e6fe9044da/servo-build-dependencies/ci-map.jinja#L5)). <!-- Please describe your changes on the following line: --> --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: --> - [ ] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [ ] These changes fix #__ (github issue number if applicable). <!-- Either: --> - [ ] There are tests for these changes OR - [x] These changes do not require tests because they change the CI configuration <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> Source-Repo: https://github.com/servo/servo Source-Revision: 78ffce1cbe5fcce4d057b69c3cbf0cd2bc2b449c UltraBlame original commit: d897da50fc624ee27af7c16c02e368b92f4a6f5e
Failure building Firefox with rustc 1.20.0-nightly (ab5bec2 2017-06-22).
I can't reproduced on a local build with
--enable-stylo
so it's something to do with the sccache wrapper or the build environment.The text was updated successfully, but these errors were encountered: