Save CI test reports to S3. #19262

benjyw · 2023-06-06T21:56:14Z

This will let us gather data on test performance/flakiness/timeouts.

The first commit is the interesting one. The second is just the regenerated yaml files.

benjyw · 2023-06-06T21:58:11Z

build-support/bin/generate_github_workflows.py

@@ -545,8 +573,38 @@ def upload_log_artifacts(self, name: str) -> Step:
            "if": "always()",
            "continue-on-error": True,
            "with": {
-                "name": f"pants-log-{name.replace('/', '_')}-{self.platform_name()}",
-                "path": ".pants.d/pants.log",
+                "name": f"logs-{name.replace('/', '_')}-{self.platform_name()}",


While I'm here, renamed to omit the superfluous pants- prefix, for brevity.

benjyw · 2023-06-06T21:58:33Z

build-support/bin/generate_github_workflows.py

-                "name": f"pants-log-{name.replace('/', '_')}-{self.platform_name()}",
-                "path": ".pants.d/pants.log",
+                "name": f"logs-{name.replace('/', '_')}-{self.platform_name()}",
+                "path": ".pants.d/*.log",


While I'm here, this will upload exception.log as well as the pants.log, which we should always have been doing.

benjyw · 2023-06-07T13:46:25Z

Right now the path on S3 is rather long:

s3://logs.pantsbuild.org/test/reports/<platform>/<date>/<ref>/<workflow run id>/<workflow run attempt>/

And the <ref> itself can have 4 "path" components: e.g., for a PR it would be refs/pull/19262/merge

For example:

s3://logs.pantsbuild.org/test/reports/Linux-x86_64/2023-06-07/refs/pull/19262/merge/5200484861/1/

But I think this is fine. In practice it's easy to navigate, and very easy to script up a download via the aws cli's wildcarding:

$ aws s3 sync s3://logs.pantsbuild.org/test/reports/Linux-x86_64/2023-06-07/ . --exclude "*" --include "*.xml"

The date is there so it's easy to download files for a certain date range. We lead with the platform because it's usually good to lead with a low-cardinality dimension, and because we typically investigate tests on a single platform.

benjyw · 2023-06-07T13:48:34Z

build-support/bin/copy_to_s3.py

@@ -65,7 +70,8 @@ def perform_copy(
    src_prefix: str,
    dst_prefix: str,
    path: str,
-    dst_region: str,
+    region: str,
+    acl: str | None = None,


The logs bucket is entirely private and has ACLs disabled, so we must not set an ACL in this case, while still setting one for objects in the binaries bucket.

Given it's private, if someone (eg. me) wants to explore the data, how would they do it?

We can create a user for you under the pants account!

benjyw · 2023-06-07T13:54:12Z

I've confirmed that the files are written to S3 as expected.

benjyw · 2023-06-07T14:04:04Z

Still TODO: how to gather data on tests that time out at the Pants level, where we kill the pytest process so we won't get a pytest-level report.

thejcannon · 2023-06-07T14:16:31Z

Why not just use the artifacts from GitHub itself? https://github.com/pantsbuild/pants/actions/runs/5200836905

benjyw · 2023-06-07T14:31:47Z

Why not just use the artifacts from GitHub itself? https://github.com/pantsbuild/pants/actions/runs/5200836905

Discussed here. We could, but those are harder to work with, because it's not easy to tie them back to a time period, a platform, or even a SHA or PR #. It can be done with a lot of GitHub API tennis, but it's easier to drop into S3, and potentially from there index or further process them using AWS resources (dynamodb for example).

thejcannon · 2023-06-07T15:40:17Z

Hmmm OK. In a world where we don't yet have explicit dependable funding I worry about not using the off-the-shelf free thing. But that's also FUD which doesn't have grounding in any numbers. So I'll leave it up to those who know understand AWS better. I'm happy we're taking strides to collect metrics/logs.

...speaking of I have a Pants plugin to upload metrics/logs to AWS CloudWatch. Is that something we might want to explore?

benjyw · 2023-06-07T15:55:24Z

It's teeny tiny amounts of data/outgoing bandwidth compared to our binary storage on S3. I would like to move away from that first. I agree with cutting AWS costs, but we'd best do that in cost order.

benjyw · 2023-06-07T15:56:06Z

...speaking of I have a Pants plugin to upload metrics/logs to AWS CloudWatch. Is that something we might want to explore?

Oooh! What sort of metrics?

thejcannon · 2023-06-07T17:01:52Z

...speaking of I have a Pants plugin to upload metrics/logs to AWS CloudWatch. Is that something we might want to explore?

Oooh! What sort of metrics?

Anything gatherable from a WorkunitsCallback.

huonw

Looks good overall.

I think the cost aspect of s3 is fine, but there's a larger downside: we can't write to it from PRs from forks. This means the data doesn't include a large batch of data, and, in particular, doesn't include the flakes that affect humans most embarrassingly (discouraging new/irregular contributors: us maintainers/people with push access have already overcome that hurdle). I assume the rate of flakes is similar/the distribution is the same, but it still seems unfortunate to drop that data...

However, the alternative is GHA artifacts, which are hard to query and have a retention time, after which they're deleted. Neither of these are complete blockers... but they certainly don't make GHA a slam-dunk.

Something is better than nothing, and neither option is obviously better than the other, so more than happy to go with S3 and iterate.

For timeouts, I wonder if it's a generally-useful feature for pants to be generating its own junit reports that capture them, for real users to put through their own test dashboards? Either that or zipkin/opentelemetry/... traces for whole builds, that report the work units as timed-out spans? (I'm assuming that metrics reported to CloudWatch might not be saying "test_foo.py timed out" specifically?)

build-support/bin/generate_github_workflows.py

huonw · 2023-06-07T16:35:36Z

build-support/bin/generate_github_workflows.py

+                    ./build-support/bin/copy_to_s3.py \
+                      --src-prefix=dist/test/reports \
+                      --dst-prefix=s3://logs.pantsbuild.org/{s3_path} \
+                      --path=


This --path= empty arg looks suspicious. Just confirming it's intentional?

It is. I could use --path='' to emphasize.

huonw · 2023-06-07T16:37:29Z

build-support/bin/copy_to_s3.py

@@ -65,7 +70,8 @@ def perform_copy(
    src_prefix: str,
    dst_prefix: str,
    path: str,
-    dst_region: str,
+    region: str,
+    acl: str | None = None,


Given it's private, if someone (eg. me) wants to explore the data, how would they do it?

build-support/bin/generate_github_workflows.py

benjyw · 2023-06-07T17:30:41Z

Hmm yeah, this not working for forks is not great. But at least this is some good data to be starting with.

Maybe we can have a cron job that copies the gha artifacts to S3...

benjyw · 2023-06-07T17:31:13Z

I had in fact thought of Pants generating its own junit xml for timed-out tests, since they won't have one generated by pytest.

benjyw · 2023-06-07T17:32:18Z

...speaking of I have a Pants plugin to upload metrics/logs to AWS CloudWatch. Is that something we might want to explore?

Oooh! What sort of metrics?

Anything gatherable from a WorkunitsCallback.

So reinventing BuildSense ;-)

- Replace the long REF with the short one, and replace any slashes in that with underscores. - Add the job id to the path. - Set --path="" explicitly.

huonw

Awesome

huonw · 2023-06-07T20:50:59Z

build-support/bin/generate_github_workflows.py

+            + self.platform_name()
+            + "/"
+            + "$(git show --no-patch --format=%cd --date=format:%Y-%m-%d)/"
+            + "${GITHUB_REF_NAME//\\//_}/${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}/${GITHUB_JOB}"


Just double checking: does this still give sensible keys for this PR?

In particular, I'm not at all sure of the value of GITHUB_REF_NAME in a pull request from the GHA docs?

(As a broader point, maybe this step (or the copy-to-S3 script) could print the fully substituted bucket/key-prefix for easier debugging and finding of the artefacts, when looking at the CI logs?)

It's 12345/merge, which we write as 1235_merge

Cool, seems good. Thanks for confirming!

asherf · 2023-06-07T21:31:43Z

.github/workflows/test.yaml

@@ -975,12 +1027,25 @@ jobs:

        '
    - continue-on-error: true
+      env:
+        AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}


FWIW - Switching to use GH OIDC w/ AWS is more secure (done in the TC repo so you can refer to it for examples)
https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services
AWS keys can leak and need to be rotated from time to time, by using OIDC you can avoid this.

Good to know! Will look into that for a followup for all our AWS use in GHA.

This will let us gather data on test performance/flakiness/timeouts. The first commit is the interesting one. The second is just the regenerated yaml files.

benjyw added the category:internal CI, fixes for not-yet-released features, etc. label Jun 6, 2023

benjyw commented Jun 6, 2023

View reviewed changes

benjyw force-pushed the benjyw_save_test_reports_to_s3 branch 5 times, most recently from a82535a to 71e83ab Compare June 7, 2023 13:17

benjyw commented Jun 7, 2023

View reviewed changes

benjyw added 2 commits June 7, 2023 09:50

Save CI test reports to S3.

0444503

Generated files

57be6ed

benjyw force-pushed the benjyw_save_test_reports_to_s3 branch from 71e83ab to 57be6ed Compare June 7, 2023 13:51

benjyw requested review from Eric-Arellano and huonw June 7, 2023 13:52

huonw reviewed Jun 7, 2023

View reviewed changes

benjyw added 2 commits June 7, 2023 13:53

Code review feedback:

d94d341

- Replace the long REF with the short one, and replace any slashes in that with underscores. - Add the job id to the path. - Set --path="" explicitly.

Fix date in path

2a5ce9e

benjyw requested a review from huonw June 7, 2023 20:35

huonw approved these changes Jun 7, 2023

View reviewed changes

Echo upload dest

61b1604

asherf reviewed Jun 7, 2023

View reviewed changes

benjyw merged commit ad20ba4 into main Jun 8, 2023

benjyw deleted the benjyw_save_test_reports_to_s3 branch June 8, 2023 02:59

Eric-Arellano mentioned this pull request Jun 19, 2023

Prepare 2.18.0.dev2 release #19351

Merged

huonw mentioned this pull request Sep 22, 2023

CI: collect/aggregate results to highlight flakiest tests #19182

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save CI test reports to S3. #19262

Save CI test reports to S3. #19262

benjyw commented Jun 6, 2023 •

edited

Loading

benjyw Jun 6, 2023 •

edited

Loading

benjyw Jun 6, 2023 •

edited

Loading

benjyw commented Jun 7, 2023 •

edited

Loading

benjyw Jun 7, 2023 •

edited

Loading

huonw Jun 7, 2023

benjyw Jun 7, 2023

benjyw commented Jun 7, 2023

benjyw commented Jun 7, 2023

thejcannon commented Jun 7, 2023

benjyw commented Jun 7, 2023

thejcannon commented Jun 7, 2023

benjyw commented Jun 7, 2023

benjyw commented Jun 7, 2023

thejcannon commented Jun 7, 2023

huonw left a comment

huonw Jun 7, 2023

benjyw Jun 7, 2023

huonw Jun 7, 2023

benjyw commented Jun 7, 2023

benjyw commented Jun 7, 2023

benjyw commented Jun 7, 2023

huonw left a comment

huonw Jun 7, 2023

benjyw Jun 7, 2023

huonw Jun 7, 2023

asherf Jun 7, 2023

benjyw Jun 8, 2023

Save CI test reports to S3. #19262

Save CI test reports to S3. #19262

Conversation

benjyw commented Jun 6, 2023 • edited Loading

benjyw Jun 6, 2023 • edited Loading

Choose a reason for hiding this comment

benjyw Jun 6, 2023 • edited Loading

Choose a reason for hiding this comment

benjyw commented Jun 7, 2023 • edited Loading

benjyw Jun 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjyw commented Jun 7, 2023

benjyw commented Jun 7, 2023

thejcannon commented Jun 7, 2023

benjyw commented Jun 7, 2023

thejcannon commented Jun 7, 2023

benjyw commented Jun 7, 2023

benjyw commented Jun 7, 2023

thejcannon commented Jun 7, 2023

huonw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjyw commented Jun 7, 2023

benjyw commented Jun 7, 2023

benjyw commented Jun 7, 2023

huonw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjyw commented Jun 6, 2023 •

edited

Loading

benjyw Jun 6, 2023 •

edited

Loading

benjyw Jun 6, 2023 •

edited

Loading

benjyw commented Jun 7, 2023 •

edited

Loading

benjyw Jun 7, 2023 •

edited

Loading