Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libct/nsenter: namespace the bindfd shuffle #3599

Merged
merged 5 commits into from
Jul 4, 2023

Commits on Jul 4, 2023

  1. chore(libct/nsenter): extract utility code

    ...from nsexec.c so they can be used in cloned_binary.c.
    
    Signed-off-by: Cory Snider <[email protected]>
    corhere authored and cyphar committed Jul 4, 2023
    Configuration menu
    Copy the full SHA
    35fddfd View commit details
    Browse the repository at this point in the history
  2. libct/nsenter: annotate write_log() prototype

    ...so the compiler can warn about mismatches between the format string
    and varargs.
    
    Signed-off-by: Cory Snider <[email protected]>
    corhere authored and cyphar committed Jul 4, 2023
    Configuration menu
    Copy the full SHA
    890dcee View commit details
    Browse the repository at this point in the history
  3. libct/nsenter: refactor ipc funcs for reusability

    Modify receive_fd() and send_fd() so they can be more readily reused in
    cloned_binary.c. Change receive_fd() to have a single responsibility:
    receiving and returning a single file descriptor over a UNIX domain
    socket. Make send_fd() useable in precarious execution contexts such as
    a clone(CLONE_VFORK|CLONE_VM) "thread" where allocating heap memory or
    calling exit() would be dangerous.
    
    Signed-off-by: Cory Snider <[email protected]>
    corhere authored and cyphar committed Jul 4, 2023
    Configuration menu
    Copy the full SHA
    8f67178 View commit details
    Browse the repository at this point in the history
  4. libct/nsenter: set FD_CLOEXEC on received fd

    Signed-off-by: Cory Snider <[email protected]>
    corhere authored and cyphar committed Jul 4, 2023
    Configuration menu
    Copy the full SHA
    3b191ff View commit details
    Browse the repository at this point in the history
  5. libct/nsenter: namespace the bindfd shuffle

    Processes can watch /proc/self/mounts or /mountinfo, and the kernel
    will notify them whenever the namespace's mount table is modified. The
    notified process still needs to read and parse the mountinfo to
    determine what changed once notified. Many such processes, including
    udisksd and SystemD < v248, make no attempt to rate-limit their
    mountinfo notifications. This tends to not be a problem on many systems,
    where mount tables are small and mounting and unmounting is uncommon.
    Every runC exec which successfully uses the try_bindfd container-escape
    mitigation performs two mount()s and one umount() in the host's mount
    namespace, causing any mount-watching processes to wake up and parse the
    mountinfo file three times in a row. Consequently, using 'exec' health
    checks on containers has a larger-than-expected impact on system load
    when such mount-watching daemons are running. Furthermore, the size of
    the mount table in the host's mount namespace tends to be proportional
    to the number of OCI containers as a unique mount is required for the
    rootfs of each container. Therefore, on systems with mount-watching
    processes, the system load increases *quadratically* with the number of
    running containers which use health checks!
    
    Prevent runC from incidentally modifying the host's mount namespace for
    container-escape mitigations by setting up the mitigation in a temporary
    mount namespace.
    
    Signed-off-by: Cory Snider <[email protected]>
    corhere authored and cyphar committed Jul 4, 2023
    Configuration menu
    Copy the full SHA
    017d699 View commit details
    Browse the repository at this point in the history