[ACTION NEEDED] Fix flaky integration tests at distribution level #667

gaiksaya · 2024-04-03T21:04:03Z

What is the bug?
It was observed in 2.13.0 and previous other releases that this component manually signed off on the release for failing integration tests. See opensearch-project/opensearch-build#4433 (comment)
The flakiness of the test runs take a lot of time from the release team to collect go/no-go decision and significantly lower the confidence in the release bundles.

How can one reproduce the bug?
Steps to reproduce the behavior:

Run integration testing for altering and see the failures.
Issues can be reproduced using the steps declared in AUTOCUT issues for failed integration testing

What is the expected behavior?
Tests should be consistently passing.

Do you have any additional context?
Please note that this is a hard blocker for 2.14.0 release as per the discussion here

bbarani · 2024-04-23T22:10:48Z

@martin-gaievski @vibrantvarun Can you please provide your inputs?

martin-gaievski · 2024-04-30T15:58:56Z

As per our deep dive inconsistency in integ tests is caused by the implementation of memory circuit breaker on ml-commons side. In scope of our tests we're deploying/undeploying local models using ml-commons API and after multiple such calls memory CB became open. I've opened an issue in ml-commons for this matter opensearch-project/ml-commons#2308.
As a short-term mitigation for 2.14 we have optimized our tests (issues #683 and #689) to minimize number of times local model got redeployed. As of now integ tests are constantly pass in our local copy of https://github.com/opensearch-project/opensearch-build/

gaiksaya · 2024-04-30T20:31:44Z

Adding 2.14.0 release manager @rishabh6788

martin-gaievski · 2024-05-13T21:00:06Z

Seems that implemented approach is stable, there are no failing tests in 2.14 release pipeline. Resolving this issue

gaiksaya added bug Something isn't working untriaged v2.14.0 labels Apr 3, 2024

dblock mentioned this issue Apr 15, 2024

[RELEASE]: Stabilize the integration test runs for distribution builds, no manual sign-offs anymore opensearch-project/opensearch-build#4588

Open

12 tasks

martin-gaievski removed the untriaged label Apr 30, 2024

martin-gaievski closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ACTION NEEDED] Fix flaky integration tests at distribution level #667

[ACTION NEEDED] Fix flaky integration tests at distribution level #667

gaiksaya commented Apr 3, 2024

bbarani commented Apr 23, 2024

martin-gaievski commented Apr 30, 2024

gaiksaya commented Apr 30, 2024

martin-gaievski commented May 13, 2024

[ACTION NEEDED] Fix flaky integration tests at distribution level #667

[ACTION NEEDED] Fix flaky integration tests at distribution level #667

Comments

gaiksaya commented Apr 3, 2024

bbarani commented Apr 23, 2024

martin-gaievski commented Apr 30, 2024

gaiksaya commented Apr 30, 2024

martin-gaievski commented May 13, 2024