Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler doesn't work when using Docker #65662

Closed
jethrogb opened this issue Oct 21, 2019 · 16 comments · Fixed by #65685
Closed

Compiler doesn't work when using Docker #65662

jethrogb opened this issue Oct 21, 2019 · 16 comments · Fixed by #65685
Labels
C-bug Category: This is a bug. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@jethrogb
Copy link
Contributor

When I run this sequence of commands:

docker run --rm -it ubuntu:xenial
apt update
apt install wget
wget https://sh.rustup.rs -O rustup.sh
chmod ugo+x rustup.sh 
./rustup.sh -y --default-toolchain nightly
source $HOME/.cargo/env
rustc -vV

I get this error:

rustc 1.40.0-nightly (7979016af 2019-10-20)
binary: rustc
commit-hash: 7979016aff545f7b41cc517031026020b340989d
commit-date: 2019-10-20
host: x86_64-unknown-linux-gnu
release: 1.40.0-nightly
error: failed to find a `codegen-backends` folder in the sysroot candidates:
* /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu
* /root/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu

This happens due to the statx syscall failing with EPERM. I believe Docker uses seccomp to limit which system calls may be made, and the statx call is too new, so it's not whitelisted. Because the syscall fails with EPERM instead of ENOSYS, the fallback to regular stat doesn't work.

Host kernel: 4.15.0-65-generic

docker version

Client:
 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.6.2
 Git commit:   f5ec1e2
 Built:        Thu Jul  5 23:07:48 2018
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.2-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.6.2
 Git commit:   f5ec1e2
 Built:        Thu Jul  5 23:07:48 2018
 OS/Arch:      linux/amd64
 Experimental: false
@Mark-Simulacrum Mark-Simulacrum added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Oct 21, 2019
@Mark-Simulacrum
Copy link
Member

Nominating for libs team as this is a recent regression caused by #65094 -- cc @alexcrichton

Personally this leads me to believe we should either be happier to fallback to the previous code (e.g., for any error) or revert the PR entirely since breaking usage of Rust code in Docker is not really feasible (even if this is arguably a Docker bug).

@jethrogb
Copy link
Contributor Author

(e.g., for any error)

Alternatively, we can try statx on something that should always succeed (maybe /?)

@mati865
Copy link
Contributor

mati865 commented Oct 21, 2019

Cannot it match ENOSYS or EPERM just like getrandom syscall does?

} else if err == libc::ENOSYS || err == libc::EPERM {

@jethrogb
Copy link
Contributor Author

Unlike for getrandom, getting EPERM is not sufficient evidence that statx doesn't work.

@alexcrichton
Copy link
Member

Thanks for investigating this down to statx being the culprit, definitely makes sense!

I was searching around for other instances of this, and definitely turns out we're not the only ones running into this

Overaall I don't think there's a lot of prior art for this to draw from, it seems that everyone's working around the seccomp issue rather than addressing it directly.

I think a reasonable solution might be to do something like try to stat AT_CWD initially and if that fails with EPERM or ENOSYS we disable the syscall entirely, otherwise it's cached as always good to use. @oxalica would you be interested in helping to implement this?

@oxalica
Copy link
Contributor

oxalica commented Oct 21, 2019

I think a reasonable solution might be to do something like try to stat AT_CWD initially and if that fails with EPERM or ENOSYS we disable the syscall entirely, otherwise it's cached as always good to use. @oxalica would you be interested in helping to implement this?

@alexcrichton
Do you mean to check current working directory instead of the actual argument before the first call, and regard EPERM and ENOSYS as statx not implemented?
I'm glad to fix it.

@alexcrichton
Copy link
Member

@oxalica yeah that's what I'm thinking, the idea being that if we don't know whether statx works we invoke it with a path which we know should work almost all the time (like the cwd) and if it returns EPERM in that situation we can be pretty certain we're seccomp blocked.

@mguillemot-tel
Copy link

We are encountering a similar issue while building an Docker image with rustc & cargo on CircleCI. We are basically doing:

rustup-init -y --no-modify-path --default-toolchain nightly
rustup --version
cargo --version
rustc --version
cargo fmt --version

and are getting:

rustup 1.20.0 (a7f257941 2019-10-14)
cargo 1.40.0-nightly (3a9abe3f0 2019-10-15)
rustc 1.40.0-nightly (7979016af 2019-10-20)
error: no such subcommand: `fmt`

This errors happens since yesterday. Our last successful build with the same script was 2 days ago, with these versions:

rustup 1.20.0 (a7f257941 2019-10-14)
cargo 1.40.0-nightly (3a9abe3f0 2019-10-15)
rustc 1.40.0-nightly (c23a7aa77 2019-10-19)
rustfmt 1.4.9-nightly (33e3667 2019-10-07)

So basically between rustc c23a7aa77 2019-10-19 and rustc 7979016af 2019-10-20, something changed that makes cargo unable to figure out that rustfmt is installed.

After many tests on different Docker hosts, we haved narrowed the issue to the following: if the Docker host is using the overlay2 or btrfs storage drivers, everything works perfectly; if it is using aufs storage driver, then cargo fmt basically cannot see that the ~/.cargo/bin/cargo-fix file exists, and treats rustfmt as not installed.

@jethrogb
Copy link
Contributor Author

@mguillemot-tel On the hosts where things don't work, what is the docker version and host kernel version? Also, can you run your command with strace to see if this is because of statx?

@oxalica
Copy link
Contributor

oxalica commented Oct 23, 2019

I tried ubuntu:xenial with docker 18.09.2 on Linux kernel 4.14.118, it seems that statx works well.

root@86646b35b199:~# cat >test.c
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/syscall.h>

#define SYS_statx 332 // x86_64
#define STATX_ALL 0xFFF

int main (void) {
    char buf[0x100] = {};
    int ret = syscall(SYS_statx, 0, "/", 0, STATX_ALL, buf);
    if (ret == 0)
        puts("ok");
    else {
        int e = errno;
        perror("err");
        printf("errno = %d\n", e);
    }
    return 0;
}
root@86646b35b199:~# gcc test.c
root@86646b35b199:~# ./a.out
ok

Maybe need to update docker?

@jhfrontz
Copy link

Related? containers/buildah#1568

@mati865
Copy link
Contributor

mati865 commented Oct 23, 2019

You need rather recent Docker and libseccomp packges.

if the Docker host is using the overlay2 or btrfs storage drivers, everything works perfectly; if it is using aufs storage driver, then cargo fmt basically cannot see that the ~/.cargo/bin/cargo-fix file exists

Old Docker versions (or new versions running on the old systems) default to aufs but in recent versions this has changed to overlay2. That's possible explanation of what you are seeing.

@sorccu
Copy link

sorccu commented Oct 24, 2019

On CircleCI you cannot choose the storage driver, though you can opt in to newer Docker versions, which we've done.

docker info on CircleCI

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 18.09.3
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 7
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-1027-gcp
Operating System: Ubuntu 16.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.298GiB
Name: default-ccdb4d48-4a1e-40b1-90af-0ccd64a89a94
ID: LYQL:2PXY:AHV3:PLE6:H4ZU:CQQR:IY7N:TE53:LP4R:4ECZ:ZUN5:OGKJ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 provider=generic
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

Kernel and Docker versions are fairly recent, but aufs is indeed being used.

I also tried @oxalica 's snippet on a standard ubuntu:xenial image which produced the following output:

err: Operation not permitted
errno = 1

Now, CircleCI may or may not be doing something unorthodox here, but I think it is clear that this change has broken things for potentially quite a few users who do not have the capability to simply update or reconfigure their docker environment by themselves.

Centril added a commit to Centril/rust that referenced this issue Oct 25, 2019
Fix check of `statx` and handle EPERM

Should fix rust-lang#65662

rust-lang#65662 (comment)
> I think a reasonable solution might be to do something like try to stat AT_CWD initially and if that fails with EPERM or ENOSYS we disable the syscall entirely, otherwise it's cached as always good to use.

r? @alexcrichton
@bors bors closed this as completed in f1d747a Oct 25, 2019
@jethrogb
Copy link
Contributor Author

@mguillemot-tel can you confirm your issue was solved with the latest nightly?

@faern
Copy link
Contributor

faern commented Oct 28, 2019

I can confirm it seems to work well in docker again 👍 We were having these issues with statx last week, and added --privileged to docker to solve it. Now we can remove it and it still works.

@mguillemot-tel
Copy link

@jethrogb Yes, it's working perfectly again! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants