Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include executed tests in the build metrics (and use a custom test display impl) #108659

Merged
merged 14 commits into from
Mar 21, 2023

Conversation

pietroalbini
Copy link
Member

The main goal of this PR is to include all tests executed in CI inside the build metrics JSON files. I need this for Ferrocene, and @Mark-Simulacrum expressed desire to have this as well to ensure all tests are executed at least once somewhere in CI.

Unfortunately implementing this required rewriting inside of bootstrap all of the code to render the test output to console. libtest supports outputting JSON instead of raw text, which we can indeed use to populate the build metrics. Doing that suppresses the console output though, and compared to rustc and Cargo the console output is not included as a JSON field.

Because of that, this PR had to reimplement both the "pretty" format (one test per line, with rust.verbose-tests = true), and the "terse" format (the wall of dots, with rust.verbose-tests = false). The current implementation should have the exact same output as libtest, except for the benchmark output. libtest's benchmark output is broken in the "terse" format, so since that's our default I slightly improved how it's rendered.

Also, to bring parity with libtest I had to introduce support for coloring output from bootstrap, using the same dependencies annotate-snippets uses. It's now possible to use builder.color_for_stdout(Color::Red, "text") and builder.color_for_stderr(Color::Green, "text") across all of bootstrap, automatically respecting the --color flag and whether the stream is a terminal or not.

I recommend reviewing the PR commit-by-commit.
r? @Mark-Simulacrum

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Mar 2, 2023
Behind the scenes Clippy uses compiletest-rs, which doesn't support the
--json flag we added to Rust's compiletest.
@ehuss
Copy link
Contributor

ehuss commented Mar 2, 2023

I noticed this removes the output from Cargo and places it at the end. For example, it now shows:

running 0 tests


test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 24.40µs


running 49 tests
.................................................

test result: ok. 49 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 4.69ms


running 19 tests
...................

test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1.82ms


running 15 tests
...............

test result: ok. 15 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 28.05ms

instead of:

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/ieee.rs (obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/ieee-c3f97d5f9513b69f)

running 49 tests
.................................................
test result: ok. 49 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/ppc.rs (obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/ppc-8d20debf1e8d2487)

running 19 tests
...................
test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/rustc_arena-d805e744b0c02363)

running 15 tests
...............
test result: ok. 15 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

Because it delays writing the stderr till the end. That can make it difficult to see which test suite is running at any one time. It also removes any sense of progress, as you just see it hang while cargo is building things (no progress bar, no "Compiling" messages, etc.). Is it at all possible to process both stderr and stdout at the same time?

Another concern is that delaying reading stderr could potentially deadlock. It's not too hard to generate enough data on stderr to fill the pipe buffer (particularly on platforms with smaller buf sizes). For example, running ./x.py test src/tools/cargo -v caused it to deadlock.

If you run with something like --nocapture, then it panics because it doesn't get something that looks like JSON. Would it maybe be possible for it to ignore lines that aren't JSON?

@pietroalbini
Copy link
Member Author

Because it delays writing the stderr till the end. That can make it difficult to see which test suite is running at any one time. It also removes any sense of progress, as you just see it hang while cargo is building things (no progress bar, no "Compiling" messages, etc.). Is it at all possible to process both stderr and stdout at the same time?

Another concern is that delaying reading stderr could potentially deadlock. It's not too hard to generate enough data on stderr to fill the pipe buffer (particularly on platforms with smaller buf sizes). For example, running ./x.py test src/tools/cargo -v caused it to deadlock.

I made the change to show the standard error only at the end because without it the message compiletest displays when something fails gets interleaved with the failure message. I agree with you that doesn't work, I'll try a different approach.

If you run with something like --nocapture, then it panics because it doesn't get something that looks like JSON. Would it maybe be possible for it to ignore lines that aren't JSON?

Right, forgot about --nocapture 😅 I'll do that.

@pietroalbini
Copy link
Member Author

So, to reiterate the problem with stderr, it was that compiletest emits a message at the end of a failing run to stderr, and that message would be always interleaved. The solutions that came to mind were:

  1. Capture the standard error and show it at the end. This caused the problems Eric rightfully pointed out.
  2. Spawn a separate thread that re-prints the captured standard error as it's outputted by the process, but have both the renderer and the stderr lock a mutex whenever they write. This would fully solve the problem, but by capturing Cargo doesn't color the messages anymore, and I'm not aware of any easy way to have a piped stream pretend to be a tty.
  3. Change compiletest to emit a special {"type":"compiletest_message","message":"foo"} instead of foo when --json is passed. This works, as the message would be displayed by the rendered, but it feels unelegant.
  4. Change compiletest to use println!() instead of eprintln!() to print that message. With the change of handling non-JSON stdout, this has the same effect as 3. but in a cleaner way.

I went with option 4., let me know if you want a different approach to be used or if there are approaches I didn't consider.

@bjorn3
Copy link
Member

bjorn3 commented Mar 4, 2023

An alternative to option 4 is to redirect stdout and stderr to the same pipe or file.

@pietroalbini
Copy link
Member Author

Wouldn't that have the same problem as option 2 (losing colors)?

@bjorn3
Copy link
Member

bjorn3 commented Mar 4, 2023

Right. Unless you did use a pseudo terminal, but that is not easily portable to Windows.

@Mark-Simulacrum
Copy link
Member

We only lose colors on the " Running tests/ppc.rs (obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/ppc-8d20debf1e8d2487)", right? Or on both that and test output? I feel like that line not being colored is.. fine, but then I typically don't find colors very useful anyway...

@pietroalbini
Copy link
Member Author

@Mark-Simulacrum we would lose:

  • Color on the "Running tests/pcc.rs)" line
  • Color on the "Compiling rustc_data_structures" lines
  • The Cargo progress bar for compilation

Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks like this should work.

It seems unfortunate to need to write such a large amount of code for rendering. I'm wondering if an alternate solution would be to extend --logfile to contain the information you want. I realize that would be a more difficult change to make, but perhaps something to pursue later.

To support that, I think there would need to be some mechanism to indicate the format for the logfile. I'm not sure if that is just the combination of --format and --logfile.

Another thing that would need to be addressed is having some sort of templated filename. Right now, --logfile will overwrite the file. That is a problem for running cargo test on something that runs multiple tests (like multiple integration tests, or something with doctests). Somehow one would need to be able to map the file to the kind of test being run.
(Or have --logfile extend the file, and have extra information about what is being run.)

src/bootstrap/Cargo.toml Outdated Show resolved Hide resolved
src/bootstrap/Cargo.toml Outdated Show resolved Hide resolved
@bors
Copy link
Contributor

bors commented Mar 21, 2023

⌛ Testing commit aacbd86 with merge 6667682...

@bors
Copy link
Contributor

bors commented Mar 21, 2023

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing 6667682 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 21, 2023
@bors bors merged commit 6667682 into rust-lang:master Mar 21, 2023
@rustbot rustbot added this to the 1.70.0 milestone Mar 21, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6667682): comparison URL.

Overall result: ❌ regressions - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.4% [0.3%, 0.4%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [0.3%, 0.4%] 2

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.3% [0.8%, 1.8%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.4% [-3.9%, -0.4%] 5
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.6% [-3.9%, 1.8%] 7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.4%, 0.6%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [0.4%, 0.6%] 4

@pietroalbini pietroalbini deleted the pa-test-metrics branch March 22, 2023 08:37
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 25, 2023
…_output, r=pietroalbini

Bugfix: avoid panic on invalid json output from libtest

rust-lang#108659 introduces a custom test display implementation. It does so by using libtest to output json. The stdout is read and parsed; The code trims the line read and checks whether it starts with a `{` and ends with a `}`. If so, it concludes that it must be a json encoded `Message`. Unfortunately, this does not work in all cases:

- This assumes that tests running with `--nocapture` will never start and end lines with `{` and `}` characters
- Output is generated by issuing multiple `write_message` [statements](https://github.com/rust-lang/rust/blob/master/library/test/src/formatters/json.rs#L33-L60). Where only the last one issues a `\n`. This likely results in a race condition as we see multiple json outputs on the same line when running tests for the `x86_64-fortanix-unknown-sgx` target:
```
10:21:04 �[0m�[0m�[1m�[32m     Running�[0m tests/run-time-detect.rs (build/x86_64-unknown-linux-gnu/stage1-std/x86_64-fortanix-unknown-sgx/release/deps/run_time_detect-8c66026bd4b1871a)
10:21:04
10:21:04 running 1 tests
10:21:04 test x86_all ... ok
10:21:04 �[0m�[0m�[1m�[32m     Running�[0m tests/thread.rs (build/x86_64-unknown-linux-gnu/stage1-std/x86_64-fortanix-unknown-sgx/release/deps/thread-ed5456a7d80a6193)
10:21:04 thread 'main' panicked at 'failed to parse libtest json output; error: trailing characters at line 1 column 135, line: "{ \"type\": \"suite\", \"event\": \"ok\", \"passed\": 1, \"failed\": 0, \"ignored\": 0, \"measured\": 0, \"filtered_out\": 0, \"exec_time\": 0.000725911 }{ \"type\": \"suite\", \"event\": \"started\", \"test_count\": 1 }\n"', render_tests.rs:108:25
```

This PR implements a partial fix by being much more conservative of what it asserts is a valid json encoded `Message`. This prevents panics, but still does not resolve the race condition. A discussion is needed where this race condition comes from exactly and how it best can be avoided.

cc: `@jethrogb,` `@pietroalbini`
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 5, 2023
…huss

Validate `ignore` and `only` compiletest directive, and add human-readable ignore reasons

This PR adds strict validation for the `ignore` and `only` compiletest directives, failing if an unknown value is provided to them. Doing so uncovered 79 tests in `tests/ui` that had invalid directives, so this PR also fixes them.

Finally, this PR adds human-readable ignore reasons when tests are ignored due to `ignore` or `only` directives, like *"only executed when the architecture is aarch64"* or *"ignored when the operative system is windows"*. This was the original reason why I started working on this PR and rust-lang#108659, as we need both of them for Ferrocene.

The PR is a draft because the code is extremely inefficient: it calls `rustc --print=cfg --target $target` for every rustc target (to gather the list of allowed ignore values), which on my system takes between 4s and 5s, and performs a lot of allocations of constant values. I'll fix both of them in the coming days.

r? `@ehuss`
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 14, 2023
fix running Miri tests

This partially reverts rust-lang#108659 to fix rust-lang#110102: the Miri test runner does not support any flags, they are interpreted as filters instead which leads to no tests being run.

I have not checked any of the other test runners for whether they are having any trouble with these flags.

Cc `@pietroalbini` `@Mark-Simulacrum` `@jyn514`
oli-obk pushed a commit to oli-obk/miri that referenced this pull request Apr 17, 2023
fix running Miri tests

This partially reverts rust-lang/rust#108659 to fix rust-lang/rust#110102: the Miri test runner does not support any flags, they are interpreted as filters instead which leads to no tests being run.

I have not checked any of the other test runners for whether they are having any trouble with these flags.

Cc `@pietroalbini` `@Mark-Simulacrum` `@jyn514`
bors added a commit to rust-lang-ci/rust that referenced this pull request May 2, 2023
…ynchronization, r=pietroalbini

Ensure test library issues json string line-by-line

rust-lang#108659 introduces a custom test display implementation. It does so by using libtest to output json. The stdout is read line by line and parsed. The code trims the line read and checks whether it starts with a `{` and ends with a `}`.

Unfortunately, there is a race condition in how json data is written to stdout. The `write_message` function calls `self.out.write_all` repeatedly to write a buffer that contains (partial) json data, or a new line. There is no lock around the `self.out.write_all` functions. Similarly, the `write_message` function itself is called with only partial json data. As these functions are called from concurrent threads, this may result in json data ending up on the same stdout line. This PR avoids this by buffering the complete json data before issuing a single `self.out.write_all`.

(rust-lang#109484 implemented a partial fix for this issue; it only avoids that failed json parsing would result in a panic.)

cc: `@jethrogb,` `@pietroalbini`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants