`test_seldon_alert_rules` test case is failing, potential race condition #244

DnPlas · 2024-02-23T09:56:23Z

Bug Description

The test case is failing with the following message:

   File "/home/runner/work/seldon-core-operator/seldon-core-operator/tests/integration/test_charm.py", line 204, in test_seldon_alert_rules
    assert up_query_response["data"]["result"][0]["value"][1] == "1"
IndexError: list index out of range

which means that the up_query_response is either empty or missing data/values.

This issue started happening after 1d1a6f5 introduced a new assertion to ensure the up metric is not firing any alerts.

This issue is affecting main and track/1.17

To Reproduce

I was only able to reproduce it in the CI

Environment

on_push CI

Relevant Log Output

Latest CI run

The text was updated successfully, but these errors were encountered:

syncronize-issues-to-jira · 2024-02-23T09:56:31Z

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5374.

This message was autogenerated

DnPlas · 2024-02-23T09:58:42Z

There is a potential race condition between the time it takes for the test case to run the assertion and when prometheus charm has scraped metrics from seldon-controller-manager. In other charms, we have placed retry logic to allow some time to prometheus to scrape metrics and have them available in the prometheus endpoint. #243 is attempting to fix this issue by adding a retry.

* tests: add a retry when asserting the up metric Adding a retry for checking the state of an alert will allow time to prometheus-k8s to scrape the necessary metrics for a unit, without it we may run into a race condition where the assertion of the metric is run before prometheus is even able to scrape. This commit adds a retry logic to avoid this. Fixes #244

DnPlas added the bug Something isn't working label Feb 23, 2024

DnPlas mentioned this issue Feb 23, 2024

tests: add a retry when asserting the up metric #243

Merged

1 task

DnPlas closed this as completed in #243 Feb 23, 2024

DnPlas mentioned this issue Feb 23, 2024

tests: add a retry when asserting the up metric (#243) #245

Merged

DnPlas mentioned this issue Feb 29, 2024

[ci] intermittent failures on Observability Integration tests canonical/istio-operators#384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`test_seldon_alert_rules` test case is failing, potential race condition #244

`test_seldon_alert_rules` test case is failing, potential race condition #244

DnPlas commented Feb 23, 2024 •

edited

Loading

syncronize-issues-to-jira bot commented Feb 23, 2024

DnPlas commented Feb 23, 2024

test_seldon_alert_rules test case is failing, potential race condition #244

test_seldon_alert_rules test case is failing, potential race condition #244

Comments

DnPlas commented Feb 23, 2024 • edited Loading

Bug Description

To Reproduce

Environment

Relevant Log Output

syncronize-issues-to-jira bot commented Feb 23, 2024

DnPlas commented Feb 23, 2024

`test_seldon_alert_rules` test case is failing, potential race condition #244

`test_seldon_alert_rules` test case is failing, potential race condition #244

DnPlas commented Feb 23, 2024 •

edited

Loading