Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x-pack/metricbeat AWS tests failure #40242

Closed
oakrizan opened this issue Jul 15, 2024 · 8 comments
Closed

x-pack/metricbeat AWS tests failure #40242

oakrizan opened this issue Jul 15, 2024 · 8 comments
Assignees
Labels
flaky-test Unstable or unreliable test cases. Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Comments

@oakrizan
Copy link
Contributor

oakrizan commented Jul 15, 2024

Flaky Test

Stack Trace

=== Failed
--
  | === FAIL: x-pack/metricbeat/module/aws/sqs TestFetch (17.64s)
  | sqs_integration_test.go:28:
  | Error Trace:	/go/src/github.com/elastic/beats/x-pack/metricbeat/module/aws/sqs/sqs_integration_test.go:28
  | Error:      	Should NOT be empty, but was []
  | Test:       	TestFetch

To execute AWS tests on CI aws label should added to the PR

@oakrizan oakrizan added the flaky-test Unstable or unreliable test cases. label Jul 15, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 15, 2024
@dliappis dliappis mentioned this issue Jul 15, 2024
4 tasks
@dliappis
Copy link
Contributor

FYI @elastic/obs-ds-hosted-services

@dliappis dliappis added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Jul 15, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 15, 2024
@dliappis
Copy link
Contributor

dliappis commented Jul 15, 2024

@zmoog
Copy link
Contributor

zmoog commented Jul 15, 2024

Hey @dliappis, we're checking the tests. @kaiyan-sheng thinks we may need to update the latency settings.

@kaiyan-sheng
Copy link
Contributor

I created a SQS queue at 9:35 AM but the first cloudwatch metric didn't show up till 9:41 AM. With 5min latency configured, unless we wait for a long time before checking for events, we have a high chance not seeing any data point.

@zmoog
Copy link
Contributor

zmoog commented Jul 15, 2024

We may need to fetch the metrics more than once for latency. Testing in progress.

@zmoog
Copy link
Contributor

zmoog commented Jul 16, 2024

@dliappis, the PR #40251 is in review.

We added a PeriodicReportingFetchV2Error function that tries to fetch the metrics multiple times until it gets values or the timeout expires. The (non-periodic) ReportingFetchV2Error only fetches the metric values once, resulting in much higher chances of not getting any metric value.

I tried running the tests five times with five successful executions.

There are other aspects we want to investigate and improve, but I hope these changes will remove the flakiness.

@dliappis
Copy link
Contributor

@dliappis, the PR #40251 is in review.

Thank you @zmoog for the quick actions here! Looking forward to getting the PR merged/resolve the issue asap as pretty much all branches now, most of the time, are hitting this issue (example: https://buildkite.com/elastic/beats-pipeline-scheduler/builds/89#0190bdfe-8bed-442a-9f72-5b7074694c12, see also here).

@zmoog
Copy link
Contributor

zmoog commented Jul 22, 2024

I'm closing this since the issue seems to be resolved. Feel free to reopen it in case the problem shows up again.

@zmoog zmoog closed this as completed Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Unstable or unreliable test cases. Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

No branches or pull requests

4 participants