Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cargo keeps entire rustc/rustdoc stdout in memory #6197

Closed
Aaron1011 opened this issue Oct 21, 2018 · 14 comments · Fixed by #7838 or #6289
Closed

Cargo keeps entire rustc/rustdoc stdout in memory #6197

Aaron1011 opened this issue Oct 21, 2018 · 14 comments · Fixed by #7838 or #6289
Assignees
Labels
A-console-output Area: Terminal output, colors, progress bar, etc.

Comments

@Aaron1011
Copy link
Member

Aaron1011 commented Oct 21, 2018

Problem

When cargo invokes rustc or rustdoc (via cargo doc or cargo build, the child process's entire stdout is kept in memory. This can lead to the system's memory being completely exhausted if rustc or rustdoc produce a lot of output (e.g. when RUST_LOG is set).

Steps

  1. Create a file called yes.sh containing the string yes (If the yes program is unavailable, replace it with a script that endlessly writes to stdout)
  2. Run chmod +x ./yes.sh
  3. Run RUSTC=./yes.sh cargo build in any crate directory.
  4. Observe that cargo's memory usage quickly rises to consume all available memory.

Possible Solution(s)

cargo should clear each line of a child command's stdout from memory once it has been printed, since it won't be needed again.

Notes

Output of cargo version: cargo 1.29.0 (524a578d7 2018-08-05)
Tested on Arch Linux - this should be reproducible on any Linux-based system.

@ishitatsuyuki
Copy link
Contributor

I disagree. We don't have buffering yet, and yes if buffering is implemented your solution address a part of the problem.

The problem is simply that we don't have backpressure for consuming logs as we use an unbounded channel. yes is simply producing output too fast than we can forward. As a result, this is unlikely to actually blow things up in reality.

Fix: make this bounded.

let (tx, rx) = channel();

@Aaron1011
Copy link
Member Author

@ishitatsuyuki: This did come up for real - setting RUST_LOG=debug when compiling a large crate causes it for me. The yes example was just a convenient way to reliably reproduce it.

@ehuss
Copy link
Contributor

ehuss commented Oct 22, 2018

I think there are multiple issues here.

Simple commands like rustc -Vv use command.output() which buffers the entire output. exec_with_output is used in a few places to call rustc to gather information. The yes example hits these situations, so it may be a little artificial since these usually should emit only a small amount of output even with debugging.

During the normal compile execution, even though the output is streamed to the terminal, internally it buffers the entire result here. This is used in a few cases (like capturing build script output), but for the case of running rustc I don't believe it is necessary.

@alexcrichton
Copy link
Member

I think the correct fix here is to have Cargo discard rustc's output when it otherwise doesn't need to capture it, and I think that should be relatively easy to do?

bors added a commit that referenced this issue Nov 9, 2018
Avoid retaining all rustc output in memory.

There are still a few cases where all output is buffered. They are:
- Running discovery commands like `rustc -vV`, `rustc --print xxx`, etc.
- Build script output.
- Testsuite's debug `.stream()` function.

This should cover the concern of the issue, though, which is normal compilation.

Closes #6197
@bors bors closed this as completed in #6289 Nov 9, 2018
@Aaron1011
Copy link
Member Author

This issue is back - I haven't managed to find a minimized reproduction yet, but building the compiler with RUSTC_LOG=rustc::traits=debug ./x.py build causes cargo to consume multiple gigabytes of memory.

@Aaron1011
Copy link
Member Author

Can this issue be re-opened? I've run into this again today, and it's making debugging quite difficult.

@Mark-Simulacrum
Copy link
Member

This must have slipped into beta. Reopening; we might want to try and backport my PR to beta I guess?

@Mark-Simulacrum Mark-Simulacrum added the I-nominated-to-discuss To be discussed during issue triage on the next Cargo team meeting label Jan 16, 2020
@ehuss
Copy link
Contributor

ehuss commented Jan 18, 2020

Considering the release is in two weeks, can it wait? @Aaron1011, would it be possible to confirm that it is fixed? Set cargo in config.toml to a nightly version of cargo and see if you can reproduce the problem?

@Aaron1011
Copy link
Member Author

@ehuss: I can't seem to get rustup to use the newer version: cargo +stage1 --version prints cargo 1.42.0-nightly (ad3dbe10e 2020-01-13)

@ehuss
Copy link
Contributor

ehuss commented Jan 18, 2020

I'm not sure what you mean by "use the newer version" (that seems like a very recent nightly).

When you use rustup links (like +stage1), rustup has a built-in fallback to use the nightly toolchain when cargo is not found. By default, ./x.py does not build cargo, nor does it place it into the stage1 directory.

x.py by default uses the cargo version from the last beta. If you modify cargo in config.toml to point to a recent cargo (like /Users/eric/.rustup/toolchains/nightly-x86_64-apple-darwin/bin/cargo in my case), then x.py will use that cargo to build rustc.

@Aaron1011
Copy link
Member Author

@ehuss: I was able to reproduce the issue with that nightly (1.42.0-nightly (ad3dbe10e 2020-01-13)) - should I test a newer one as well?

@ehuss
Copy link
Contributor

ehuss commented Jan 21, 2020

There are two issues here, neither of which are particularly easy to fix.

  1. Fresh jobs do not spawn a new thread (src). This causes the main thread to block, so it fails to process Messages, so Read cached output line-by-line #7737 didn't actually fix the issue.

  2. As mentioned above, the message queue does not have any sort of back pressure, so the messages bunch up inside the queue when it can't print them fast enough.

@ehuss ehuss added the A-console-output Area: Terminal output, colors, progress bar, etc. label Jan 25, 2020
@ehuss ehuss self-assigned this Jan 25, 2020
@bors bors closed this as completed in 2bdc879 Mar 12, 2020
@LoganDark
Copy link

LoganDark commented Jul 27, 2020

image

This is cargo build.

While writing that sentence, it grew even more.

image

@ehuss
Copy link
Contributor

ehuss commented Jul 27, 2020

@LoganDark an you share more details about what you are doing? It might help to file a new issue. There are some known situations where Cargo buffers the output from rustc (such as the initial rustc --version discovery), but otherwise it shouldn't be buffering or consuming excessive memory. It also would help to share which version of cargo you are using.

@weihanglo weihanglo removed the I-nominated-to-discuss To be discussed during issue triage on the next Cargo team meeting label Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-console-output Area: Terminal output, colors, progress bar, etc.
Projects
None yet
7 participants