Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: use juju 3.4/stable instead of 3.5/stable in track 1.17 #258

Closed
wants to merge 11 commits into from

Conversation

DnPlas
Copy link
Contributor

@DnPlas DnPlas commented Jun 21, 2024

kimwnasptd and others added 11 commits November 27, 2023 11:17
This PR updates the .github files to
* Ensure we have a file for tasks/enhancements
* Ensure we expose dod in task issues
* Use the FastAPI for ticket sync, and not JIRA_URL
…#228)

- tests(integration): Configure test_seldon_servers test in order to be able 
  to run tests only on one of the SeldonDeployments CRs. This can be done 
  using pytest's-k flag and the corresponding keyword. You can find keywords 
  in CONTRIBUTING.md or by looking at the id field of seldon_servers.py objects.
- tests(integration): Remove limits from SeldonDeployments applied.
- Update CONTRIBUTING.md with instructions on how to run tests separately.
- ci: Run test_seldon_servers integration test for each SeldonDeployment CR 
  in a distinct GH runner. 
- ci: Extract automatically tox integration environments using a script.

Fixes #229
update the task template to also include context and dod.
* Use ROCKs for charm and predictor servers

* Update test data to expect ROCKs

* fix: Use updated tensorflow-serving ROCK

* tests: Update test results according to ROCK
* fix: correctly configure one scrape job to avoid firig alerts

The metrics endpoint configuration had two scrape jobs, one for the
regular metrics endpoint, and a second one based on a dynamic list of
targets. The latter was causing the prometheus scraper to try and scrape
metrics from *:80/metrics, which is not a valid endpoint. This was
causing the UnitsUnavailable alert to fire constantly because that job
was reporting back that the endpoint was not available.
This new job was introduced by #94
with no apparent justification. Because the seldon charm has changed
since that PR, and the endpoint it is configuring is not valid, this
commit will remove the extra job.

This commit also refactors the MetricsEndpointProvider instantiation and
removes the metrics-port config option as this value should not change.

Finally, this commit changes the alert rule interval from 0m to 5m, as
this interval is more appropriate for production environments.

Part of canonical/bundle-kubeflow#564

* tests: add an assertion for checking unit is available

The test_prometheus_grafana_integration test case was doing queries to prometheus
and checking the request returned successfully and that the application name and model
was listed correctly. To make this test case more accurately, we can add an assertion that
also checks that the unit is available, this way we avoid issues like the one described in
canonical/bundle-kubeflow#564.

Part of canonical/bundle-kubeflow#564
* tests: add a retry when asserting the up metric

Adding a retry for checking the state of an alert will allow time to prometheus-k8s to scrape
the necessary metrics for a unit, without it we may run into a race condition where the assertion
of the metric is run before prometheus is even able to scrape.
This commit adds a retry logic to avoid this.

Fixes #244
* Fix grafana dashboard by removing `uid` from the `datasource` fields.
* Add tags `ckf` and `seldon` to dashboard.

Part of canonical/bundle-kubeflow#856
Refs canonical/bundle-kubeflow#834
* ci: bump juju to 3.5

Co-authored-by: NohaIhab <[email protected]>
@DnPlas DnPlas requested a review from a team as a code owner June 21, 2024 13:27
@DnPlas
Copy link
Contributor Author

DnPlas commented Jun 21, 2024

Closing as this component will fall out of support soon.

@DnPlas DnPlas closed this Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants