-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report ThresholdsHaveFailed on Cloud runs #3876
Conversation
47878c9
to
87c60a2
Compare
87c60a2
to
e031507
Compare
// Although by looking at [ResultStatus] and [RunStatus] isn't self-explanatory, | ||
// the scenario when the test run has finished, but it failed is an exceptional case for those situations | ||
// when thresholds have been crossed (failed). So, we report this situation as such. | ||
if testProgress.RunStatus == cloudapi.RunStatusFinished { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain about this assumption. During test run it possible also that an error during execution will happen and or it could be aborted by limits, like the comment below stays, so just being in a run state doesn't guaranty that the test has been aborted by thresholds.
However, I must admit I don't know how does the response from backend look like, have you chatted with them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I got it correctly, we have RunStatus
es specifically for those situations (see https://github.com/grafana/k6/blob/master/cloudapi/run_status.go#L16-L20).
Also, my assumption is that RunStatusAbortedThreshold
doesn't apply here, because in this case the test hasn't been aborted (abortOnFail: true
), just finished and failed at the end, because unsuccessful threshold checks.
And yes, that's what I got after a conversation with @d14c and @Griatch, so maybe they can confirm it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, I know there's other edge cases that we may want to handle within this piece of code (other of the aformentioned RunStatus
es), but as I left as note in the PR description, I'd prefer to just focus on this edge case here, and leave the other scenarios for upcoming, separated PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @olegbespalov that sounds like a risky assumption. Furthermore, I have the feeling that all the time that we need to read this code, it will required a lot of cognitive load.
Could we ask the backend to add a dedicated RunStatus for it? Like: RunStatusFinishedWithCrossedThresholds
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joanlopez A few points here:
ABORTED_BY_THRESHOLD
this is arun_status
end status for thresholds that are set to abort the test when crossed. In this case the test will never reachrun_status
FINISHED
.run_status
FINISHED can still be reached if the threshold was not set up to abort the test. In this case the outcome will only be reflected in theresult_status
which is handled completely separately fromrun_status
.result_status
False/True will start out as True and will switch to False only on a failed threshold afaik. So it's for example possible to have a test that has arun_status
ABORTED_SYSTEM but withresult_status
True because the threshold was never crossed.
For the upcoming public API we are introducing a 'success' status that is a little easier to reason around. Inviting @fornfrey on this to make sure I get it right; but it's basically a combination of run/result status that will fail both on a threshold-cross and on a test that fail to complete properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I got it right, it looks like my assumptions are confirmed again by @Griatch.
What do you think guys? @olegbespalov @codebien
Do you want to go like this? Or wait until we have the new API with clearer information?
If we prefer to wait, I can close this PR, and create an issue with some context, and marked as blocked, waiting for the new public API. As you prefer!
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all input! it sounds all clear and ledgit and it seems like we could clean the TODO
// TODO: use different exit codes for failed thresholds vs failed test (e.g. aborted by system/limit)
If we prefer to wait, I can close this PR, and create an issue with some context, and marked as blocked, waiting for the new public API. As you prefer!
I don't know about ETA of new API, but if we could bring value earlier that sounds better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is how likely it is:
- that the test has finished
- fails
- and it isn't a threshold
I guess it is fine if we have some false possitives, but still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what previous comment from @Griatch explains? Like:
result_status False/True will start out as True and will switch to False only on a failed threshold afaik.
I think that, in case there's any other error (e.g. aborted by system), that's reflected on the run_status
, and not on the result_status
, which seems to be tied to thresholds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I guess I did skim over that particular line a bit too much.
Again this seems like a strange behaviour, but I guess it is what it is .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general!
Although the API seems strange and I have added one more question in the discussion.
// Although by looking at [ResultStatus] and [RunStatus] isn't self-explanatory, | ||
// the scenario when the test run has finished, but it failed is an exceptional case for those situations | ||
// when thresholds have been crossed (failed). So, we report this situation as such. | ||
if testProgress.RunStatus == cloudapi.RunStatusFinished { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is how likely it is:
- that the test has finished
- fails
- and it isn't a threshold
I guess it is fine if we have some false possitives, but still.
Yeah, I agree, because |
(Note: It looks like there is probably a couple of scenarios more that could also be reported more concretely, but I prefer to open a separate pull request for each, so we can discuss them independently)
What?
It handles the exceptional case when
testProgress.RunStatus == cloudapi.RunStatusFinished
but withtestProgress.ResultStatus == cloudapi.ResultStatusFailed
, which means that the test run failed due to thresholds being crossed, in order to exit with the specific error code (ThresholdsHaveFailed = 99
).Before
After
Why?
Currently, when a test run executed in the Grafana Cloud k6 fails due to thresholds being crossed, it is reported as a generic cloud error (
CloudTestRunFailed = 97
), giving no clue to the user about why the test run did fail.In contraposition, when the same test run is executed locally, the user receives some meaningful feedback and the specific error code (
ThresholdsHaveFailed = 99
).All that said makes, imho, the whole user experience worse, because it's not only inconsistent, and prevents users from reacting on those exit codes, but also because the current feedback is that generic (see below), that gives no clue at all to the user about why the test run did fail:
Checklist
make lint
) and all checks pass.make tests
) and all tests pass.Related PR(s)/Issue(s)