-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add annotations attribute to job spans #3
base: main
Are you sure you want to change the base?
feat: add annotations attribute to job spans #3
Conversation
dc192fc
to
bae9dab
Compare
469650d
to
c6452af
Compare
// We only care about failure annotations generated by github files, | ||
// excluding excluding annotations from other tools like rspec failures | ||
const annotations = annotationsResponse.data.filter( | ||
(a: JobAnnotation) => | ||
a.annotation_level === "failure" && a.path === ".github", | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on investigation of some historical jobs, GitHub and Infra-related failures originate from a path
of .github
, while an rspec
failure originates from a path
of the ruby spec file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For the infra-related failures, is it possible to perform additional attribution e.g. which failures originated from GitHub vs. our own infrastructure?
- Based on the commentary (in docstrings) above, does this mean that we have an infrastructure failure when
github.conclusion == failure
andgithub.job.annotations is not null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the infra-related failures, is it possible to perform additional attribution e.g. which failures originated from GitHub vs. our own infrastructure?
This is the hope! However, the messages in these annotations vary and I think we'll need to gather some data and work to classify distinct messages into buckets, eg. the PR description example, Could not assume role with OIDC...
I think we'd classify as a infra
-type failure.
Based on the commentary (in docstrings) above, does this mean that we have an infrastructure failure when github.conclusion == failure and github.job.annotations is not null?
Following on my clarification above, that's not necessarily the case as we'd need to inspect the annotation message to really determine root cause; the comments/logic here is just to exclude capturing other types of Job failures, like a rspec
failure — does that make sense?
const annotationMessages = jobAnnotations | ||
.map((a: JobAnnotation) => a.message) | ||
.join(";"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Joining several failure messages together isn't ideal, but currently don't see a better way to ensure we're capturing any/all failures (there can be more than one). This still gives us the ability to create an SLI based on matching specific keywords to this string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, Honeycomb doesn't have support for arrays. As for capturing these failure messages and possibly using these to created an SLI (based on a derived field), we can feature flag this using the ci-experiments action to gather some data on a subset of the failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can feature flag this using the ci-experiments action to gather some data on a subset of the failures
We can do that!
@tcsullens when able could you pls update this repository so that:
Much thanks |
@brett-anderson Do you have permissions to help out here? |
I have added all platform team members as admins on the repo and protected main. There are no required CI checks right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks @tcsullens!
Add span attribute for annotation messages from failed jobs to better attribute GHA job failures. This will allow us to collect and analyze root causes of failed jobs.
This implementation collects annotations messages from only expected GitHub/Infra errors and joins each failure message into a
;
-separated string as the attribute value. We expect to first collect some data and then determine how best to setup an SLI to track/alert on GitHub/Infra failures.The actual code changes are in the
src/
folderAn example using this test PR, https://github.com/wrapbook/app/pull/30740, sets a new span attribute
github.job.annotations
to the message of the annotation (Trace Link):Additional traces from testing: https://ui.honeycomb.io/wrapbook/environments/gha-ci/datasets/ci-v2/result/ANkTYWgHYXi?hideCompare
sc-126750