Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] ☂️ Monitor compaction jobs running on shoot control planes #610

Open
8 of 9 tasks
abdasgupta opened this issue Jun 5, 2023 · 1 comment
Open
8 of 9 tasks
Assignees
Labels
kind/enhancement Enhancement, improvement, extension

Comments

@abdasgupta
Copy link
Contributor

abdasgupta commented Jun 5, 2023

Feature (What you would like to be added):
As Druid runs in the namespace different than the shoot control plane but the compaction jobs triggered by it runs in the shoot control plane, it's not straightforward to collect the metrics of compaction jobs and create the dashboard out of it. There are a number of prometheus involved in the process that should collect and forward them to others. The compaction metrics are needed to be channelized in such a way so that it ultimately reaches to prometheus running in shoot control plane. Only then the metrics would be ready for consumption by Dashboards running in shoot control planes.

As Druid is running in Garden namespace, Cache prometheus will be able to collect the Druid controller metrics i.e. compaction metrics. Then, control plane prometheus can fedarate those metrics along with cadvisor metrics for Compaction job. We can use these scraped metrics from control plane prometheus and filter out the shoot specific compaction job metrics to show the dashboard for a particular shoot

To further enhance the visualization of compaction metrics, we can also create a dashboard in seed. The dashboard may show aggregated compaction job performance.

In my first comment, I attached an image shared by @istvanballok and @rickardsjp to better understand the flow.

Motivation (Why is this needed?):
We have druids that triggers compaction jobs after a certain threshold of delta events are crossed in control plane ETCD. Compaction jobs compacts the delta events that accumulated in object storage and create full snapshots out of it. But the jobs may be heavy at certain times. and we need proper monitoring for the jobs running in each shoot control planes.
Approach/Hint to the implement solution (optional):

@abdasgupta abdasgupta added the kind/enhancement Enhancement, improvement, extension label Jun 5, 2023
@abdasgupta
Copy link
Contributor Author

image (2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Enhancement, improvement, extension
Projects
None yet
Development

No branches or pull requests

2 participants