Include executed tests in the build metrics (and use a custom test display impl) #108659

pietroalbini · 2023-03-02T15:32:08Z

The main goal of this PR is to include all tests executed in CI inside the build metrics JSON files. I need this for Ferrocene, and @Mark-Simulacrum expressed desire to have this as well to ensure all tests are executed at least once somewhere in CI.

Unfortunately implementing this required rewriting inside of bootstrap all of the code to render the test output to console. libtest supports outputting JSON instead of raw text, which we can indeed use to populate the build metrics. Doing that suppresses the console output though, and compared to rustc and Cargo the console output is not included as a JSON field.

Because of that, this PR had to reimplement both the "pretty" format (one test per line, with rust.verbose-tests = true), and the "terse" format (the wall of dots, with rust.verbose-tests = false). The current implementation should have the exact same output as libtest, except for the benchmark output. libtest's benchmark output is broken in the "terse" format, so since that's our default I slightly improved how it's rendered.

Also, to bring parity with libtest I had to introduce support for coloring output from bootstrap, using the same dependencies annotate-snippets uses. It's now possible to use builder.color_for_stdout(Color::Red, "text") and builder.color_for_stderr(Color::Green, "text") across all of bootstrap, automatically respecting the --color flag and whether the stream is a terminal or not.

I recommend reviewing the PR commit-by-commit.
r? @Mark-Simulacrum

Behind the scenes Clippy uses compiletest-rs, which doesn't support the --json flag we added to Rust's compiletest.

ehuss · 2023-03-02T23:56:07Z

I noticed this removes the output from Cargo and places it at the end. For example, it now shows:

running 0 tests


test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 24.40µs


running 49 tests
.................................................

test result: ok. 49 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 4.69ms


running 19 tests
...................

test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1.82ms


running 15 tests
...............

test result: ok. 15 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 28.05ms

instead of:

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/ieee.rs (obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/ieee-c3f97d5f9513b69f)

running 49 tests
.................................................
test result: ok. 49 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/ppc.rs (obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/ppc-8d20debf1e8d2487)

running 19 tests
...................
test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/lib.rs (obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/rustc_arena-d805e744b0c02363)

running 15 tests
...............
test result: ok. 15 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

Because it delays writing the stderr till the end. That can make it difficult to see which test suite is running at any one time. It also removes any sense of progress, as you just see it hang while cargo is building things (no progress bar, no "Compiling" messages, etc.). Is it at all possible to process both stderr and stdout at the same time?

Another concern is that delaying reading stderr could potentially deadlock. It's not too hard to generate enough data on stderr to fill the pipe buffer (particularly on platforms with smaller buf sizes). For example, running ./x.py test src/tools/cargo -v caused it to deadlock.

If you run with something like --nocapture, then it panics because it doesn't get something that looks like JSON. Would it maybe be possible for it to ignore lines that aren't JSON?

pietroalbini · 2023-03-03T08:53:47Z

Because it delays writing the stderr till the end. That can make it difficult to see which test suite is running at any one time. It also removes any sense of progress, as you just see it hang while cargo is building things (no progress bar, no "Compiling" messages, etc.). Is it at all possible to process both stderr and stdout at the same time?

Another concern is that delaying reading stderr could potentially deadlock. It's not too hard to generate enough data on stderr to fill the pipe buffer (particularly on platforms with smaller buf sizes). For example, running ./x.py test src/tools/cargo -v caused it to deadlock.

I made the change to show the standard error only at the end because without it the message compiletest displays when something fails gets interleaved with the failure message. I agree with you that doesn't work, I'll try a different approach.

If you run with something like --nocapture, then it panics because it doesn't get something that looks like JSON. Would it maybe be possible for it to ignore lines that aren't JSON?

Right, forgot about --nocapture 😅 I'll do that.

pietroalbini · 2023-03-03T10:13:22Z

So, to reiterate the problem with stderr, it was that compiletest emits a message at the end of a failing run to stderr, and that message would be always interleaved. The solutions that came to mind were:

Capture the standard error and show it at the end. This caused the problems Eric rightfully pointed out.
Spawn a separate thread that re-prints the captured standard error as it's outputted by the process, but have both the renderer and the stderr lock a mutex whenever they write. This would fully solve the problem, but by capturing Cargo doesn't color the messages anymore, and I'm not aware of any easy way to have a piped stream pretend to be a tty.
Change compiletest to emit a special {"type":"compiletest_message","message":"foo"} instead of foo when --json is passed. This works, as the message would be displayed by the rendered, but it feels unelegant.
Change compiletest to use println!() instead of eprintln!() to print that message. With the change of handling non-JSON stdout, this has the same effect as 3. but in a cleaner way.

I went with option 4., let me know if you want a different approach to be used or if there are approaches I didn't consider.

bjorn3 · 2023-03-04T11:16:05Z

An alternative to option 4 is to redirect stdout and stderr to the same pipe or file.

pietroalbini · 2023-03-04T13:41:37Z

Wouldn't that have the same problem as option 2 (losing colors)?

bjorn3 · 2023-03-04T16:36:05Z

Right. Unless you did use a pseudo terminal, but that is not easily portable to Windows.

Mark-Simulacrum · 2023-03-04T22:47:40Z

We only lose colors on the " Running tests/ppc.rs (obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/ppc-8d20debf1e8d2487)", right? Or on both that and test output? I feel like that line not being colored is.. fine, but then I typically don't find colors very useful anyway...

pietroalbini · 2023-03-05T11:58:25Z

@Mark-Simulacrum we would lose:

Color on the "Running tests/pcc.rs)" line
Color on the "Compiling rustc_data_structures" lines
The Cargo progress bar for compilation

ehuss

Overall it looks like this should work.

It seems unfortunate to need to write such a large amount of code for rendering. I'm wondering if an alternate solution would be to extend --logfile to contain the information you want. I realize that would be a more difficult change to make, but perhaps something to pursue later.

To support that, I think there would need to be some mechanism to indicate the format for the logfile. I'm not sure if that is just the combination of --format and --logfile.

Another thing that would need to be addressed is having some sort of templated filename. Right now, --logfile will overwrite the file. That is a problem for running cargo test on something that runs multiple tests (like multiple integration tests, or something with doctests). Somehow one would need to be able to map the file to the kind of test being run.
(Or have --logfile extend the file, and have extra information about what is being run.)

src/bootstrap/Cargo.toml

bors · 2023-03-21T14:33:17Z

⌛ Testing commit aacbd86 with merge 6667682...

bors · 2023-03-21T17:17:19Z

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing 6667682 to master...

rust-timer · 2023-03-21T19:16:35Z

Finished benchmarking commit (6667682): comparison URL.

Overall result: ❌ regressions - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.3%, 0.4%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.4%	[0.3%, 0.4%]	2

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.3%	[0.8%, 1.8%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.4%	[-3.9%, -0.4%]	5
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.6%	[-3.9%, 1.8%]	7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.4%, 0.6%]	4
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.5%	[0.4%, 0.6%]	4

…_output, r=pietroalbini Bugfix: avoid panic on invalid json output from libtest rust-lang#108659 introduces a custom test display implementation. It does so by using libtest to output json. The stdout is read and parsed; The code trims the line read and checks whether it starts with a `{` and ends with a `}`. If so, it concludes that it must be a json encoded `Message`. Unfortunately, this does not work in all cases: - This assumes that tests running with `--nocapture` will never start and end lines with `{` and `}` characters - Output is generated by issuing multiple `write_message` [statements](https://github.com/rust-lang/rust/blob/master/library/test/src/formatters/json.rs#L33-L60). Where only the last one issues a `\n`. This likely results in a race condition as we see multiple json outputs on the same line when running tests for the `x86_64-fortanix-unknown-sgx` target: ``` 10:21:04 �[0m�[0m�[1m�[32m Running�[0m tests/run-time-detect.rs (build/x86_64-unknown-linux-gnu/stage1-std/x86_64-fortanix-unknown-sgx/release/deps/run_time_detect-8c66026bd4b1871a) 10:21:04 10:21:04 running 1 tests 10:21:04 test x86_all ... ok 10:21:04 �[0m�[0m�[1m�[32m Running�[0m tests/thread.rs (build/x86_64-unknown-linux-gnu/stage1-std/x86_64-fortanix-unknown-sgx/release/deps/thread-ed5456a7d80a6193) 10:21:04 thread 'main' panicked at 'failed to parse libtest json output; error: trailing characters at line 1 column 135, line: "{ \"type\": \"suite\", \"event\": \"ok\", \"passed\": 1, \"failed\": 0, \"ignored\": 0, \"measured\": 0, \"filtered_out\": 0, \"exec_time\": 0.000725911 }{ \"type\": \"suite\", \"event\": \"started\", \"test_count\": 1 }\n"', render_tests.rs:108:25 ``` This PR implements a partial fix by being much more conservative of what it asserts is a valid json encoded `Message`. This prevents panics, but still does not resolve the race condition. A discussion is needed where this race condition comes from exactly and how it best can be avoided. cc: `@jethrogb,` `@pietroalbini`

…huss Validate `ignore` and `only` compiletest directive, and add human-readable ignore reasons This PR adds strict validation for the `ignore` and `only` compiletest directives, failing if an unknown value is provided to them. Doing so uncovered 79 tests in `tests/ui` that had invalid directives, so this PR also fixes them. Finally, this PR adds human-readable ignore reasons when tests are ignored due to `ignore` or `only` directives, like *"only executed when the architecture is aarch64"* or *"ignored when the operative system is windows"*. This was the original reason why I started working on this PR and rust-lang#108659, as we need both of them for Ferrocene. The PR is a draft because the code is extremely inefficient: it calls `rustc --print=cfg --target $target` for every rustc target (to gather the list of allowed ignore values), which on my system takes between 4s and 5s, and performs a lot of allocations of constant values. I'll fix both of them in the coming days. r? `@ehuss`

fix running Miri tests This partially reverts rust-lang#108659 to fix rust-lang#110102: the Miri test runner does not support any flags, they are interpreted as filters instead which leads to no tests being run. I have not checked any of the other test runners for whether they are having any trouble with these flags. Cc `@pietroalbini` `@Mark-Simulacrum` `@jyn514`

fix running Miri tests This partially reverts rust-lang/rust#108659 to fix rust-lang/rust#110102: the Miri test runner does not support any flags, they are interpreted as filters instead which leads to no tests being run. I have not checked any of the other test runners for whether they are having any trouble with these flags. Cc `@pietroalbini` `@Mark-Simulacrum` `@jyn514`

…ynchronization, r=pietroalbini Ensure test library issues json string line-by-line rust-lang#108659 introduces a custom test display implementation. It does so by using libtest to output json. The stdout is read line by line and parsed. The code trims the line read and checks whether it starts with a `{` and ends with a `}`. Unfortunately, there is a race condition in how json data is written to stdout. The `write_message` function calls `self.out.write_all` repeatedly to write a buffer that contains (partial) json data, or a new line. There is no lock around the `self.out.write_all` functions. Similarly, the `write_message` function itself is called with only partial json data. As these functions are called from concurrent threads, this may result in json data ending up on the same stdout line. This PR avoids this by buffering the complete json data before issuing a single `self.out.write_all`. (rust-lang#109484 implemented a partial fix for this issue; it only avoids that failed json parsing would result in a panic.) cc: `@jethrogb,` `@pietroalbini`

pietroalbini added 3 commits March 2, 2023 16:20

add the --json flag to compiletest

d7049ca

render compiletest output with render_tests

d2f3806

add support for terse output

f96774b

rustbot assigned Mark-Simulacrum Mar 2, 2023

rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Mar 2, 2023

pietroalbini added 5 commits March 2, 2023 16:33

add a splash of color

f816d3a

record tests in build metrics

9388c8e

add support for benchmarks

b14b355

switch all tests to use render_tests

50b3583

avoid overlapping stderr

ad9a444

pietroalbini force-pushed the pa-test-metrics branch from f56ced5 to ad9a444 Compare March 2, 2023 15:34

do not use render_tests for clippy

f23e205

Behind the scenes Clippy uses compiletest-rs, which doesn't support the --json flag we added to Rust's compiletest.

pietroalbini added 2 commits March 3, 2023 10:03

handle non-json output in stdout

9a1ff1b

change approach to prevent interleaving compiletest message

4958272

pietroalbini force-pushed the pa-test-metrics branch from 0faacca to 4958272 Compare March 3, 2023 10:10

fix name of the field containing the ignore reason

3248ab7

ehuss reviewed Mar 6, 2023

View reviewed changes

src/bootstrap/Cargo.toml Outdated Show resolved Hide resolved

pietroalbini force-pushed the pa-test-metrics branch from c89f68d to e62c37a Compare March 7, 2023 09:37

bjorn3 reviewed Mar 7, 2023

View reviewed changes

src/bootstrap/Cargo.toml Outdated Show resolved Hide resolved

switch to termcolor

c015d0d

bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 21, 2023

bors merged commit 6667682 into rust-lang:master Mar 21, 2023

rustbot added this to the 1.70.0 milestone Mar 21, 2023

This was referenced Mar 21, 2023

no python in shell scripts #107812

Closed

Rename 'src/bootstrap/native.rs' to llvm.rs #109418

Merged

pietroalbini deleted the pa-test-metrics branch March 22, 2023 08:37

raoulstrackx mentioned this pull request Mar 22, 2023

Bugfix: avoid panic on invalid json output from libtest #109484

Merged

ehuss mentioned this pull request Mar 22, 2023

Add warning message when no tests are run. rust-lang/cargo#11875

Open

raoulstrackx mentioned this pull request Mar 29, 2023

Ensure test library issues json string line-by-line #109729

Merged

This was referenced Apr 11, 2023

fix running Miri tests #110177

Merged

Miri UI tests aren't run on CI (or locally) #110102

Closed

Strange indentation in libtest output #104092

Closed

pietroalbini mentioned this pull request May 2, 2023

Tracking issue for libtest JSON output #49359

Open

ehuss mentioned this pull request May 21, 2023

./x.py test does not explain why a #[should_panic] test failed #111825

Closed

matthiaskrgr mentioned this pull request Feb 13, 2024

ICE: downgrade_to_delayed_bug: cannot downgrade Warning to DelayedBug: not an error #121006

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include executed tests in the build metrics (and use a custom test display impl) #108659

Include executed tests in the build metrics (and use a custom test display impl) #108659

pietroalbini commented Mar 2, 2023

ehuss commented Mar 2, 2023

pietroalbini commented Mar 3, 2023

pietroalbini commented Mar 3, 2023

bjorn3 commented Mar 4, 2023

pietroalbini commented Mar 4, 2023

bjorn3 commented Mar 4, 2023

Mark-Simulacrum commented Mar 4, 2023

pietroalbini commented Mar 5, 2023

ehuss left a comment

bors commented Mar 21, 2023

bors commented Mar 21, 2023

rust-timer commented Mar 21, 2023

Include executed tests in the build metrics (and use a custom test display impl) #108659

Include executed tests in the build metrics (and use a custom test display impl) #108659

Conversation

pietroalbini commented Mar 2, 2023

ehuss commented Mar 2, 2023

pietroalbini commented Mar 3, 2023

pietroalbini commented Mar 3, 2023

bjorn3 commented Mar 4, 2023

pietroalbini commented Mar 4, 2023

bjorn3 commented Mar 4, 2023

Mark-Simulacrum commented Mar 4, 2023

pietroalbini commented Mar 5, 2023

ehuss left a comment

Choose a reason for hiding this comment

bors commented Mar 21, 2023

bors commented Mar 21, 2023

rust-timer commented Mar 21, 2023

Overall result: ❌ regressions - no action needed

Instruction count

Max RSS (memory usage)

Cycles