Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic kills with cargo run on macOS 12 #10060

Closed
ehuss opened this issue Nov 8, 2021 · 22 comments
Closed

Sporadic kills with cargo run on macOS 12 #10060

ehuss opened this issue Nov 8, 2021 · 22 comments
Labels
C-bug Category: bug O-macos OS: macOS

Comments

@ehuss
Copy link
Contributor

ehuss commented Nov 8, 2021

Problem

Since recently updating to macOS 12.0.1 (Monterey), I have been experiencing sporadic issues with security failures causing the kernel to kill the process before it starts.

In the logs, I see the following messages:

default 11:54:02.325853-0800 kernel ASP: Security policy would not allow process: 33133, /Users/eric/Proj/rust/toml-rs/scratch/a/target/debug/a-337f91948997ff1e
default 11:54:02.325856-0800 kernel ASP: Could not find reference 20142, process must have died

It happens very randomly. Sometimes it will happen constantly, and sometimes it might not happen for several minutes of repeated runs.

I'm not sure what "Could not find reference" means. Is that perhaps the pid of cargo itself?

Steps

Run touch src/main.rs ; cargo run a few times on a smallish project, like something that depends on toml.

Possible Solution(s)

I haven't had a chance to investigate more. This doesn't happen on all projects (I haven't had a problem with building cargo for example). It doesn't seem to matter where I run cargo run from (editor, terminal, etc.).

Notes

No response

Version

cargo 1.58.0-nightly (6c1bc24b8 2021-10-24)
release: 1.58.0
commit-hash: 6c1bc24b8b49d4bc965f67d7037906dc199c72b7
commit-date: 2021-10-24
host: x86_64-apple-darwin
libgit2: 1.3.0 (sys:0.13.23 vendored)
libcurl: 7.77.0 (sys:0.4.49+curl-7.79.1 system ssl:(SecureTransport) LibreSSL/2.8.3)
os: Mac OS 12.0.1 [64-bit]
@ehuss ehuss added C-bug Category: bug O-macos OS: macOS labels Nov 8, 2021
@weihanglo
Copy link
Member

Cannot reproduce it on 12.1 Beta (x86_64). I think it's related to code signing or so, as this article describing.

Could you check the attributes on the binary via xattr <binary>?

@ehuss
Copy link
Contributor Author

ehuss commented Nov 17, 2021

I can't reproduce it now. I'm going to close this, as I don't think anyone else has reported it. I suspect it was just a temporary problem on my machine. I'll investigate more if it happens again.

@ehuss ehuss closed this as completed Nov 17, 2021
@arempe93
Copy link

arempe93 commented Nov 19, 2021

Started learning Rust this week and this has been happening the entire time intermittently. Exact symptoms and touch/run repro, except on Big Sur.

Mac OS 11.6.0

ASP: Security policy would not allow process: 46608, /Users/andrew/Documents/src/personal/learn-rust/minigrep/target/debug/deps/minigrep

Running the xattr target/debug/minigrep @weihanglo suggested has no output.

Idk if this is helpful but i tried changing the repro to

rm -rf target; touch src/main.rs; cargo run

Even if the kill gets triggered, you can run the output binary directly ./target/debug/minigrep and it won't get killed.

@ehuss
Copy link
Contributor Author

ehuss commented Nov 19, 2021

Hm, strange, I can go ahead and reopen.

https://developer.apple.com/forums/thread/663456 contains some investigation. I wonder if it is related to the way cargo hard-links the executable just before running it.

@ehuss ehuss reopened this Nov 19, 2021
@extrawurst
Copy link

facing this issue aswell, happens very randomly

@e9510cba
Copy link

e9510cba commented Dec 6, 2021

+1, i see something like this

    Finished dev [unoptimized + debuginfo] target(s) in 0.64s
     Running `target/debug/name`
[1]    45741 killed     cargo r
cargo 1.56.0 (4ed5d137b 2021-10-04)
rustc 1.56.1 (59eed8a2a 2021-11-01)

Also interesting to note, I'm not on MacOS 12, only MacOS 11.5.2

@charlesroussel
Copy link
Contributor

charlesroussel commented Dec 7, 2021

I have the sameerror. I also got a log trace of when the issue happen.

kernel: (AppleMobileFileIntegrity) AMFI: '<user>/perso/Aoc/target/debug/advent' has no CMS blob?
kernel: (AppleMobileFileIntegrity) AMFI: '<user>/perso/Aoc/target/debug/advent': Unrecoverable CT signature issue, bailing out.
kernel: (AppleMobileFileIntegrity) AMFI: code signature validation failed.
syspolicyd: [com.apple.syspolicy.exec:default] Unable (errno: 2) to read file at <private> for process path: <private> library path: (null)
syspolicyd: [com.apple.syspolicy.exec:default] Terminating process due to Gatekeeper rejection: 45603, <private>
kernel: build_userspace_exit_reason: illegal flags passed from userspace (some masked off) 0x141, ns: 9, code 0x8
kernel: (AppleSystemPolicy) ASP: Sleep interrupted, signal 0x100
kernel: (AppleSystemPolicy) ASP: Security policy would not allow process: 45603, <user>/perso/Aoc/target/debug/deps/advent

Cargo version:

cargo --version
cargo 1.56.0 (4ed5d137b 2021-10-04)

Mac os version: Big Sur 11.5.2

@charlesroussel
Copy link
Contributor

Signature using codesign codesign -dv --verbose=1 target/debug/advent
On a compiled binary:

Executable=<user>/perso/Aoc/target/debug/advent
Identifier=advent-c9a39a776f9fca9d
Format=Mach-O thin (arm64)
CodeDirectory v=20400 size=165168 flags=0x20002(adhoc,linker-signed) hashes=5158+0 location=embedded
Signature=adhoc
Info.plist=not bound
TeamIdentifier=not set
Sealed Resources=none
Internal requirements=none

@hkratz
Copy link
Contributor

hkratz commented Dec 7, 2021

@charlesroussel The output looks fine. If you try to run it again, does it work? So far it looked more like a timing issue.

@charlesroussel
Copy link
Contributor

charlesroussel commented Dec 7, 2021

Yes it works if I run it again.

I am currently checking if it is related to hard linking/copy on write.
There is an open issue on the cmake gitlab: https://gitlab.kitware.com/cmake/cmake/-/issues/22372#note_982992. With signed binaries being copied and killed by the system.

From the issue:

It is suspected that this is due to the way macOS caches code signature information to detect tampering with the original signature.

Corresponding openRadar issue: https://openradar.appspot.com/FB8914243

@charlesroussel
Copy link
Contributor

charlesroussel commented Dec 7, 2021

Looking at cargo code on the copy/linking part. There is an interesting bit where the issue seems already detected.

#7821

} else if env::var_os("__CARGO_COPY_DONT_LINK_DO_NOT_USE_THIS").is_some() {
// This is a work-around for a bug in macOS 10.15. When running on
// APFS, there seems to be a strange race condition with
// Gatekeeper where it will forcefully kill a process launched via
// `cargo run` with SIGKILL. Copying seems to avoid the problem.
// This shouldn't affect anyone except Cargo's test suite because
// it is very rare, and only seems to happen under heavy load and
// rapidly creating lots of executables and running them.
// See https://github.com/rust-lang/cargo/issues/7821 for the
// gory details.
fs::copy(src, dst).map(|_| ())

@charlesroussel
Copy link
Contributor

Setting the env variable __CARGO_COPY_DONT_LINK_DO_NOT_USE_THIS is a working workaround

@hkratz
Copy link
Contributor

hkratz commented Dec 7, 2021

fs::copy() tries to use the CoW fclonefileat on macOS, so it should be quite fast on APFS filesystems. Could we just unconditionally use fs::copy() on macOS?

@charlesroussel
Copy link
Contributor

charlesroussel commented Dec 8, 2021

fs::copy() tries to use the CoW fclonefileat on macOS, so it should be quite fast on APFS filesystems. Could we just unconditionally use fs::copy() on macOS?

I don't see why not. I can do the PR if you're ok with it.

It definitely seems time-related or due to a race condition.

running with cargo clean -p advent && cargo build && ./target/debug/advent I get the error
running with cargo clean -p advent && cargo build && sleep 1 && ./target/debug/advent I have no issue anymore.

Then I wanted to check the complete log. I disabled the privacy logging to have the path.

syspolicyd: [com.apple.syspolicy.exec:default] Unable (errno: 2) to read file at <users>/perso/Aoc/target/debug/deps/advent for process path: <user>/perso/Aoc/target/debug/deps/advent library path: (null)
syspolicyd: [com.apple.syspolicy.exec:default] Terminating process due to Gatekeeper rejection: 82588, <user>/perso/Aoc/target/debug/deps/advent

It bugs me that it is searching for this file <users>/perso/Aoc/target/debug/deps/advent which is never created.
I would expect it to search for <users>/perso/Aoc/target/debug/advent or <users>/perso/Aoc/target/debug/deps/advent-c9a39a776f9fca9d

@hkratz
Copy link
Contributor

hkratz commented Dec 9, 2021

I don't see why not. I can do the PR if you're ok with it.

Sure. I am not on the cargo team, but I think that would be the right course of action.

@anton-yurchenko
Copy link

I've encountered that issue while playing with the following Rust code:

struct Logger {
    files: Vec<String>,
}

impl Logger {
    fn new() -> Self {
        Self {
            files: vec![String::from("output.log"), String::from("wrapper.log")],
        }
    }

    fn print_files(self) {
        println!("logger files: {:?}", self.files);
    }

    fn set_files(&mut self, files: Vec<String>) {
        self.files = files;
    }
}

fn main() {
    let mut l = Logger::new();
    println!("files: {:?}", l.files);
    l.set_files(vec![String::from("null.log")]);
    l.print_files();
}

The only thing I did was is renaming Logger to App (back and forth) and run cargo run in between to reproduce the issue. It is reproducible roughly every 30/40 executions for me.

The error:

➜  cargo run
   Compiling logger v0.1.0 (/Users/<path-to-the-project>/logger)
    Finished dev [unoptimized + debuginfo] target(s) in 0.29s
     Running `target/debug/logger`
[1]    32219 killed     cargo run

System logs:

default	05:00:48.778079+0200	taskgated	UNIX error exception: 2
default	05:00:48.778814+0200	taskgated	no signature for pid=32219 (cannot make code: UNIX[No such file or directory])
error	05:00:48.779303+0200	syspolicyd	Unable (errno: 2) to read file at <private> for process path: <private> library path: (null)
error	05:00:48.779381+0200	syspolicyd	Terminating process due to Gatekeeper rejection: 32219, <private>
error	05:00:48.779468+0200	syspolicyd	failed to call driver: 0x3
default	05:00:48.779424+0200	kernel	build_userspace_exit_reason: illegal flags passed from userspace (some masked off) 0x141, ns: 9, code 0x8
default	05:00:48.779454+0200	kernel	ASP: Sleep interrupted, signal 0x100
default	05:00:48.779462+0200	kernel	ASP: Security policy would not allow process: 32219, /Users/<path-to-the-project>/logger/target/debug/deps/logger
default	05:00:48.779463+0200	kernel	ASP: Could not find reference 1632, process must have died

xattr target/debug/logger returns empty.
I am able to run the file directly with ./target/debug/logger

RustC Version: rustc 1.57.0 (f1edd0429 2021-11-29)
Cargo Version: cargo 1.57.0 (b2e52d7ca 2021-10-21)
Mac Version: macOS Monterey, Version 12.0.1

Rust installation:

brew install rustup
rustup-init
source $HOME/.cargo/env

Brew info:

rustup-init: stable 1.24.3 (bottled)
Rust toolchain installer
https://github.com/rust-lang/rustup
/usr/local/Cellar/rustup-init/1.24.3 (9 files, 7.8MB) *
  Poured from bottle on 2021-12-10 at 20:01:04
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/rustup-init.rb
License: Apache-2.0
==> Dependencies
Build: rust ✘
==> Analytics
install: 5,122 (30 days), 13,449 (90 days), 57,667 (365 days)
install-on-request: 5,011 (30 days), 13,176 (90 days), 56,328 (365 days)
build-error: 0 (30 days)

@charlesroussel
Copy link
Contributor

The number of execution before getting the error seems to be related to the binary size. This behaviour happens to me every 50 runs on a simple Helloworld and 1 in 4 on a bigger project.
I am suspecting that Gatekeeper or some other process is analyzing the file and caching the result. When hard linking, the file creation is instant. When running the process, Gatekeeper kills the process as it can't find the file in its cache

bors added a commit that referenced this issue Dec 16, 2021
Add workaround for sporadic kills when building on Macos

This is the workaround for the issue #10060
@charlesroussel
Copy link
Contributor

My PR was to linked to the issue. (I probably misconfigured something).
Should we close this issue or should we leave it open as what was merged is only a workaround ?

@alexcrichton
Copy link
Member

Sure let's close this for now and we can reconsider it if it pops up again.

@ehuss
Copy link
Contributor Author

ehuss commented Dec 23, 2021

Anyone following here: If you could let me know if you've been able to test out the latest nightly? Preferably in situations where it was failing before. I believe the fix is in nightly 2021-12-18.

@charlesroussel
Copy link
Contributor

I reproduced the issue first on stable (1.57) using
watch -e "cargo clean -p advent && cargo run

Then I called the same query for 2 hours on nightly, I could not reproduce.

─❯ cargo --version --verbose
cargo 1.59.0-nightly (fcef61230 2021-12-17)
release: 1.59.0-nightly
commit-hash: fcef61230c3b6213b6b0d233a36ba4ebd1649ec3
commit-date: 2021-12-17
host: aarch64-apple-darwin
libgit2: 1.3.0 (sys:0.13.23 vendored)
libcurl: 7.64.1 (sys:0.4.51+curl-7.80.0 system ssl:(SecureTransport) LibreSSL/2.8.3)
os: Mac OS 11.5.2 [64-bit]

@acdibble
Copy link

acdibble commented Jan 3, 2022

I am able to recreate this quite reliably even with small projects. I haven't been able to recreate on nightly.

$ rustup run nightly rustc --version
rustc 1.59.0-nightly (8f3238f89 2022-01-02)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug O-macos OS: macOS
Projects
None yet
Development

No branches or pull requests

10 participants