Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker seccomp policy incompatible with glibc 2.34 #3812

Closed
1 of 7 tasks
fweimer-rh opened this issue Jul 29, 2021 · 18 comments
Closed
1 of 7 tasks

Docker seccomp policy incompatible with glibc 2.34 #3812

fweimer-rh opened this issue Jul 29, 2021 · 18 comments
Assignees
Labels
Area: Containers investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Ubuntu

Comments

@fweimer-rh
Copy link

fweimer-rh commented Jul 29, 2021

Description

glibc 2.34 will try to use clone3 if available to enable hardware-assisted security hardening on recent x86-64 CPUs. Presently this does not work in Azure DevOps and Github Actions because the clone3 system call is blocked by seccomp policy.

This issue may have been introduced by a Moby update to a version that includes moby/moby#41889. Before that, the ENOSYS kludge added in opencontainers/runc#2750 should have prevent this failure.

For actions that run docker create, it is possible to workaround this by specifying --security-opt seccomp=unconfined, but no such option exists for docker build for some reason. (See moby/moby#34454.)

Virtual environments affected

  • Ubuntu 16.04
  • Ubuntu 18.04
  • Ubuntu 20.04
  • macOS 10.15
  • macOS 11
  • Windows Server 2016
  • Windows Server 2019

Image version and build link

Azure DevOps

Environment: ubuntu-20.04
Version: 20210718.1
Included Software: https://github.com/actions/virtual-environments/blob/ubuntu20/20210718.1/images/linux/Ubuntu2004-README.md
Image Release: https://github.com/actions/virtual-environments/releases/tag/ubuntu20%2F20210718.1
Current image version: '20210718.1'
Agent running as: 'vsts'

https://dev.azure.com/dnceng/public/_build/results?buildId=1259364&view=logs&j=91749cbd-6d9d-54ec-5351-cfcf55a8b75c&t=8f482fb7-2227-41c9-9817-1c0d425d6aed&l=6

Github Actions

   Environment: ubuntu-20.04
  Version: 20210718.1
  Included Software: https://github.com/actions/virtual-environments/blob/ubuntu20/20210718.1/images/linux/Ubuntu2004-README.md
  Image Release: https://github.com/actions/virtual-environments/releases/tag/ubuntu20%2F20210718.1

https://github.com/freeipa/freeipa-container/runs/3163951309

Is it regression?

No response

Expected behavior

clone3 should succeed or fail with ENOSYS. In the latter case, glibc will transparently use a fallback mechanism

Actual behavior

clone3 fails with EPERM, causing thread creation to fail.

Typical error messages for Fedora are:

Errors during downloading metadata for repository 'rawhide':
  - Curl error (6): Couldn't resolve host name for https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 [getaddrinfo() thread failed to start]

Or with the Python reproducer below:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib64/python3.10/threading.py", line 928, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

Repro steps

For testing purposes, Fedora rawhide (registry.fedoraproject.org/fedora:rawhide) already contains a glibc 2.34 snapshot which uses clone3 (with a fallback to clone if clone3 fails with ENOSYS). For example, this command should print 123:

$ docker run registry.fedoraproject.org/fedora:rawhide python3 -c 'import threading; threading.Thread(None, lambda: print(123)).start()'
123

EDIT Command fixed to use the Fedora registry.

@Darleev Darleev added Area: Containers OS: Ubuntu investigate Collect additional information, like space on disk, other tool incompatibilities etc. and removed needs triage labels Jul 29, 2021
@Darleev
Copy link
Contributor

Darleev commented Jul 29, 2021

Hello @fweimer-rh,
We will investigate this issue.

@omajid
Copy link

omajid commented Jul 29, 2021

@wfurt @mthalman @MichaelSimons this is the issue that we were seeing in dotnet/dotnet-buildtools-prereqs-docker#484

@dsame dsame self-assigned this Jul 30, 2021
@dsame
Copy link
Contributor

dsame commented Jul 30, 2021

@fweimer-rh the repro build prints 123
https://github.com/dsame/actions-demo/runs/3200712143?check_suite_focus=true

In my understanding this means the docker implementation currently used by github actions on ubuntu 20.04 allows glibc 2.34 bundled with the fedora:rawhide handles clone3 return code in the proper way.

Can you please clarify what pipeline problem we expected to fix from out side?

@fweimer-rh
Copy link
Author

Sorry about that. It looks like your docker binary does not do alias matching for fedora, so it goes to the Docker hub, where fedora:rawhide is something else (presumably mirrored from the Fedora registry occasionally). Try this command instead:

docker run registry.fedoraproject.org/fedora:rawhide python3 -c 'import threading; threading.Thread(None, lambda: print(123)).start()'

Perhaps also log the glibc version to be sure:

docker run registry.fedoraproject.org/fedora:rawhide rpm -q glibc

@dsame
Copy link
Contributor

dsame commented Jul 30, 2021

@fweimer-rh
ok, but it is still unclear what should be done from the github actions side.

Firstly the snipped can be modified to run
docker run --security-opt seccomp=unconfined registry.fedoraproject.org/fedora:rawhide python3 -c 'import threading; threading.Thread(None, lambda: print(123)).start()'

Do you want us to change default seccomp profile? Should not this request to be addressed to Moby team? Honestly i can hardly imagine how we can change the docker virtualisation model from the moby codebase outside.

@fweimer-rh
Copy link
Author

--security-opt seccomp=unconfined is not available for the docker build command, so users can't really work around this. The system-wide policy has to be changed for that.

I noticed that you do not use the docker.io packages from Ubuntu, so I assumed you were crafting your own thing.

@dsame
Copy link
Contributor

dsame commented Jul 30, 2021

@fweimer-rh github actions used moby "as is" without further tweaking

https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/docker-moby.sh

This is why it would be more efficient to address the issue to the Moby team: https://github.com/moby/moby/issues

@fweimer-rh
Copy link
Author

@dsame The script doesn't show the repository being used, and some log output contains a +azure marker in version strings.

Anyway, based on what you are saying, the images should inherit the fix in moby/moby#42680 in due course then? Thanks!

@dsame
Copy link
Contributor

dsame commented Jul 31, 2021

@fweimer-rh yes, of course, the PR merged into the moby repo is to be deployed to the azure actions images

@dsame dsame closed this as completed Jul 31, 2021
rcaelers added a commit to rcaelers/workrave-build-containers that referenced this issue Aug 11, 2021
omajid added a commit to omajid/dotnet-regular-tests that referenced this issue Aug 26, 2021
Fedora 32 is EOL. Add Fedora 34 and 35.

Disable security to work around
actions/runner-images#3812

Remove rawhide, because it's broken.
omajid added a commit to redhat-developer/dotnet-regular-tests that referenced this issue Aug 26, 2021
Fedora 32 is EOL. Add Fedora 34 and 35.

Disable security to work around
actions/runner-images#3812

Remove rawhide, because it's broken.
@gotmax23
Copy link

gotmax23 commented Sep 7, 2021

Would it be possible to backport the aforementioned PR into the Azure Moby package that Github Runners use?

ppisar added a commit to ppisar/libmodulemd that referenced this issue Sep 9, 2021
GitHub Actions service has not yet updated its container seccomp
policy to recognize a new clone3() syscall and OpenMandriva Cooker
upgraded glibc which utilizes it.

actions/runner-images#3812
ldorau added a commit to ldorau/rpma that referenced this issue Sep 9, 2021
Move all Fedora Rawhide CI builds to Nightly_Experimental,
since they have been failing for a long time,
because of the following error:

Curl error (6): Couldn't resolve host name for \
  https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 \
    [getaddrinfo() thread failed to start]

See:
https://www.mail-archive.com/[email protected]/msg169919.html
actions/runner-images#3812

Signed-off-by: Lukasz Dorau <[email protected]>
ldorau added a commit to ldorau/rpma that referenced this issue Sep 9, 2021
Move all Fedora Rawhide CI builds to Nightly_Experimental,
since they have been failing for a long time,
because of the following error:

Curl error (6): Couldn't resolve host name for \
  https://mirrors.fedoraproject.org/metalink?repo=rawhide&arch=x86_64 \
    [getaddrinfo() thread failed to start]

See:
https://www.mail-archive.com/[email protected]/msg169919.html
actions/runner-images#3812
@junghans
Copy link

Same issue now on ubuntu:devel (e.g. see https://github.com/kokkos/ci-containers/runs/3628868286)

ppisar added a commit to ppisar/libmodulemd that referenced this issue Oct 1, 2021
GitHub Actions service has not yet updated its container seccomp
policy to recognize a new clone3() syscall and openSUSE Tumbleweed
upgraded glibc to a version which utilizes it.

actions/runner-images#3812
jeeb added a commit to jeeb/mpv that referenced this issue Oct 2, 2021
This CI builder bases on openSUSE Tumbleweed, and recently had
its glibc updated. This led to new syscalls such as 'clone3' not
being allowed through the security layer.

Can be reverted after Github Actions updates their security policy.

actions/runner-images#3812
mcnewton added a commit to FreeRADIUS/freeradius-server that referenced this issue Oct 11, 2021
@vt-alt
Copy link

vt-alt commented Oct 20, 2021

Docker images with distributions which contain glibc-2.34 just doesn't work on GA.
Where should we report if not here?

@pascallj
Copy link

@vt-alt The problem is well known by now and there is no need to bug the developers anymore. A fix is in the pipeline for Docker.

Your distro maintainer might include a temporary fix for Glibc like Ubuntu has done with Ubuntu Impish. But rolling distros tend to not do those things. It isn't their problem anyway.

And if you want to run your GitHub Actions with a Ubuntu 20.04 or 21.04 base and a Docker image which isn't patched, there are some (temporary) solutions posted in the comments above and in the linked issues.

@gotmax23
Copy link

@vt-alt The problem is well known by now and there is no need to bug the developers anymore. A fix is in the pipeline for Docker.

I already submitted a patch for Fedora's moby-engine packages2, and I believe @pascallj has done the same for Ubuntu's packages. I asked Github to patch their Docker packages about a month and a half ago1. There's not much I can do, because Github uses different Docker packages that are provided by Microsoft.

Thanks,
Maxwell

@vt-alt
Copy link

vt-alt commented Oct 20, 2021

@pascallj @gotmax23 Thanks, that's good to know!

vt-alt added a commit to vt-alt/lkrg that referenced this issue Oct 20, 2021
When target system upodated to glibc-2.34 (as for ALT Linux) it starts
to use new syscall `clone3', which is not enabled in Docker seccomp
filter, causing run failures. GA issue [1].

Disable Docker seccomp filtering since we are in throwable virtual
environment anyway and don't need that protection.

Link: actions/runner-images#3812 [1]
Fixes: lkrg-org#121
Signed-off-by: Vitaly Chikunov <[email protected]>
vt-alt added a commit to vt-alt/lkrg that referenced this issue Oct 20, 2021
When target system updated to glibc-2.34 (as for ALT Linux) it starts
to use new syscall `clone3', which is not enabled in Docker seccomp
filter, causing run failures. GA issue [1].

Disable Docker seccomp filtering since we are in throwable virtual
environment anyway and don't need that protection.

Link: actions/runner-images#3812 [1]
Fixes: lkrg-org#121
Signed-off-by: Vitaly Chikunov <[email protected]>
Adam-pi3 pushed a commit to lkrg-org/lkrg that referenced this issue Oct 21, 2021
When target system updated to glibc-2.34 (as for ALT Linux) it starts
to use new syscall `clone3', which is not enabled in Docker seccomp
filter, causing run failures. GA issue [1].

Disable Docker seccomp filtering since we are in throwable virtual
environment anyway and don't need that protection.

Link: actions/runner-images#3812 [1]
Fixes: #121
Signed-off-by: Vitaly Chikunov <[email protected]>
mattock added a commit to mattock/openvpn-vagrant that referenced this issue Nov 3, 2021
It can't be disabled until a bug has been fixed upstream

URL: actions/runner-images#3812
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1988199
Signed-off-by: Samuli Seppänen <[email protected]>
ppisar added a commit to ppisar/libmodulemd that referenced this issue Nov 23, 2021
GitHub Actions service has not yet updated its container seccomp
policy to recognize a new clone3() syscall and openSUSE Tumbleweed
upgraded glibc to a version which utilizes it.

actions/runner-images#3812
cron2 pushed a commit to OpenVPN/openvpn-vagrant that referenced this issue Feb 13, 2022
It can't be disabled until a bug has been fixed upstream

URL: actions/runner-images#3812
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1988199
Signed-off-by: Samuli Seppänen <[email protected]>
ppisar added a commit to ppisar/libmodulemd that referenced this issue Mar 28, 2023
The cross-architecture tests fail in glib2:

    Exec failed with: Failed to close file descriptor for child process (Operation not permitted)

That's the 2-year-old GitHub bug
<actions/runner-images#3812>. Some other
projects which run ubuntu-latest reported that the bug is already
fixed. We used ubuntu-20.04. ubuntu-latest is ubuntu-22.04. Try that.
ppisar added a commit to ppisar/libmodulemd that referenced this issue Mar 28, 2023
.github/workflows/multiarch.yaml tests started to fail in glib2:

    Exec failed with: Failed to close file descriptor for child process (Operation not permitted)

That's the 2-year-old GitHub bug
<actions/runner-images#3812>. Some other
projects which run ubuntu-latest reported that the bug is already
fixed. We used ubuntu-20.04. ubuntu-latest is ubuntu-22.04 now. Try that.
ppisar added a commit to ppisar/libmodulemd that referenced this issue Mar 28, 2023
When glibc-2.34 started to use clone3() syscall, seccomp policy became
violated and glib2 fork+exec functions failed. This discrepancy was
worked around with passing "--security-opt seccomp=unconfined" option
to a container manager.

Now when Microsoft fixed the policy in ubuntu-22.04 images
(ubuntu-20.04 remains broken) and we moved to ubuntu-22.04, the
workaround is not needed. This patch removes it.

<actions/runner-images#3812>
nicolestandifer3 added a commit to nicolestandifer3/regular-tests-dotnet that referenced this issue Aug 6, 2023
Fedora 32 is EOL. Add Fedora 34 and 35.

Disable security to work around
actions/runner-images#3812

Remove rawhide, because it's broken.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Containers investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Ubuntu
Projects
None yet
Development

No branches or pull requests

8 participants