Prometheus workload loader #266

LeaveMyYard · 2024-04-22T12:29:03Z

Implemented ability to select an alternative workload loader
Implemented Prometheus workload loader
Refactored Kube API workload loader: moved separate logic (for each Kind) to a separate class
List clusters on centralized prometheus
Scanning multiple clusters in prometheus mode?
Test running from robusta UI
Test prometheus mode on centralized prometheus
A lot of testing
Fix tests (mocks are now broken)

Testing issues found:

For some reason auto-discovery does not work
Prometheus mode does not gather HPA data
Something is now broken with HPAKey

Co-authored-by: Megrez Lu <[email protected]>

…ta-dev/krr into prometheus-workload-loader

deutschj · 2024-05-22T10:41:21Z

When testing using the prometheus mode on a centralized Prometheus (VictoriaMetrics), I encountered the following errors.
I specified the labels --prometheus-label and --prometheus-cluster-label according to the docs:

python krr.py simple -p https://vmauth.xxx.com --prometheus-auth-header "<redacted>" --prometheus-label cluster -l k8s0-q --mode prometheus

To me it seems that the query includes the wrong label here, and should be avg by(cluster) instead of using the cluster name.

The name is misleading. Before this change we had both --prometheus-cluster-label and --prometheus-label which referred to very different things, leading to a bug in the code (to be fixed in the next commit). We still support -l and have added support for "--prometheus-cluster-value" which is what `-l` really represents.

arikalon1 · 2024-06-14T05:16:06Z

README.md

@@ -430,7 +430,7 @@ If your Prometheus monitors multiple clusters we require the label you defined f
 For example, if your cluster has the Prometheus label `cluster: "my-cluster-name"`, then run this command:

 ```sh
-krr.py simple --prometheus-label cluster -l my-cluster-name
+krr.py simple --prometheus-cluster-key cluster -l my-cluster-name


@aantn I think this requires a change on the robusta-runner as well
see here

@arikalon1 it should be fine. I'm not deprecating --prometheus-label, just adding another option with a name that makes more sense.

I did deprecate --prometheus-cluster-label but the runner doesn't pass that by default.

arikalon1 · 2024-06-14T05:16:20Z

robusta_krr/main.py

@@ -138,15 +138,16 @@ def run_strategy(
                ),
                prometheus_cluster_label: Optional[str] = typer.Option(
                    None,
-                    "--prometheus-cluster-label",
+                    "--prometheus-cluster-value",


Yes, this one I did change but if I understand the runner code correctly, it uses -l which is still supported and equivalent to this flag. So no change needed.

See #266 (comment)

aantn · 2024-06-14T05:18:32Z

Hi @deutschj,
You're correct of course! I've pushed a fix. Can you please test?

aantn · 2024-06-14T05:23:20Z

@deutschj please note that I changed the name of the flags a little -- see --help if necessary.

deutschj · 2024-06-27T06:53:26Z

@aantn Sure, thanks a lot for working on the bugfix!

Whett testing with the options --prometheus-cluster-key and -l I encountered the following error:

krr git:(prometheus-workload-loader) python krr.py simple --prometheus-url https://vmauth.xxx.com --prometheus-auth-header "<redacted>" --mode prometheus --prometheus-cluster-key cluster -l k8s0-q

Running Robusta's KRR (Kubernetes Resource Recommender) 1.8.2-dev
Using strategy: Simple
Using formatter: table
A newer version of KRR is available: v1.11.0

[08:22:56] INFO     Connecting using Prometheus, will load the kubeconfig.
           INFO     Using Prometheus at https://vmauth.xxx.com
           INFO     Prometheus found
           INFO     Prometheus connected successfully            
[08:22:57] INFO     Clusters available: k8s0-q, <cluster1>, <cluster2>, [...]
           CRITICAL Cannot scan multiple clusters for this prometheus, Rerun with the flag `-c <cluster>` where <cluster> is one of ['k8s0-q', '<cluster1>', '<cluster2>', [...]]

However, when providing only the -l option without the --prometheus-cluster-key, KRR starts to generate recommendations for the selected cluster. When generating, the following warnings are displayed for every existing workload:

WARNING  Prometheus returned no PercentileCPULoader metrics for StatefulSet xxx
WARNING  Prometheus returned no MaxMemoryLoader metrics for StatefulSet xxx
WARNING  Prometheus returned no CPUAmountLoader metrics for StatefulSet xxx
WARNING  Prometheus returned no MemoryAmountLoader metrics for StatefulSet xxx
INFO     Calculated recommendations for StatefulSet xxx (using 4 metrics)

And the table with the generated resource recommendations in the end is empty though.
So looks like now getting the existing workloads in the cluster works correctly, but getting the corresponding metrics from the centralized Prometheus doesn't yet.

aantn · 2024-06-28T21:04:00Z

@deutschj does it work if you provide --prometheus-cluster-key, -l, and -c with -c set to the same value as -l?

Fixes #301

deutschj · 2024-07-02T10:57:36Z

@aantn Unfortunately not, no - this yields the same error as above, CRITICAL Cannot scan multiple clusters for this prometheus, Rerun with the flag '-c <cluster>' where <cluster> is one of [...]

Pionerd · 2024-07-08T08:04:19Z

Would love to see this merged, using KRR from this feature branch for some time now

aantn · 2024-07-08T11:59:30Z

Hey, we're planning to get it merged, but no exact ETA yet.

wad-hongsumin · 2024-08-12T14:00:12Z

hello. Are there any plans for this branch to be merged?

arikalon1 · 2024-08-12T14:24:18Z

hey @wad-hongsumin

We do plan to merge it, hopefully soon

wad-hongsumin · 2024-08-12T22:47:54Z

@arikalon1
thank you. I'm hoping this PR gets merged quickly.

Add multiple workload loaders, refactor kubeapi workload loader

4fedd82

LeaveMyYard self-assigned this Apr 22, 2024

LeaveMyYard and others added 5 commits April 22, 2024 18:30

Moved the logic from #93 for a new refined structure

c7ad1cd

Co-authored-by: Megrez Lu <[email protected]>

Implement remaining kinds in prometheus workload loader

bf90978

Filter Cronjob-created jobs from display

7cd0c59

Fix cluster selector

4044b4a

Minor bug fix

b9a62a0

LeaveMyYard mentioned this pull request Apr 26, 2024

Support prom discovery #93

Closed

3 tasks

LeaveMyYard added 4 commits April 29, 2024 23:18

BaseClusterLoader, class structure change, not finished

7e8f1f4

Finished structure changes and workload loaders

4c1f5c9

PrometeusClusterLoader.list_clusters implementation

7124c80

Minor additional logging improvements

d1ad17d

LeaveMyYard requested a review from aantn April 30, 2024 15:56

LeaveMyYard marked this pull request as ready for review April 30, 2024 15:56

LeaveMyYard and others added 11 commits April 30, 2024 18:59

Minor debug comment

f7d8412

Merge branch 'main' into prometheus-workload-loader

09c372b

Fix prometheus auto-discovery

59cc29d

Merge branch 'prometheus-workload-loader' of https://github.com/robus…

d4adcf8

…ta-dev/krr into prometheus-workload-loader

Logging improvement

eb84c95

Add HPA detection for prometheus mode

e350084

Fix tests

d4e09b0

Rework ckyster selector for prometheus mode

7b6be35

Remove test raise

dce207f

Fix HPAKey

43ffb7f

One more HPAKey fix

2403898

This was referenced May 7, 2024

Provide recommendations for historical deployments #171

Open

Support metrics-based workload discovery #59

Open

aantn mentioned this pull request May 14, 2024

Recommendations missing for some sidecars #276

Open

arikalon1 reviewed Jun 14, 2024

View reviewed changes

Bug fix - thank you @deutschj

24af7f5

See #266 (comment)

add TODO

9fda8be

aantn added 2 commits July 1, 2024 10:33

Fix ArgoRollouts (#308)

f71abd1

Prevent single errors from failing scan (#307)

66389e6

Fixes #301

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus workload loader #266

Prometheus workload loader #266

LeaveMyYard commented Apr 22, 2024 •

edited

Loading

deutschj commented May 22, 2024

arikalon1 Jun 14, 2024

aantn Jun 14, 2024

arikalon1 Jun 14, 2024

aantn Jun 14, 2024

aantn commented Jun 14, 2024

aantn commented Jun 14, 2024

deutschj commented Jun 27, 2024

aantn commented Jun 28, 2024 •

edited

Loading

deutschj commented Jul 2, 2024 •

edited

Loading

Pionerd commented Jul 8, 2024

aantn commented Jul 8, 2024

wad-hongsumin commented Aug 12, 2024

arikalon1 commented Aug 12, 2024

wad-hongsumin commented Aug 12, 2024

Prometheus workload loader #266

Are you sure you want to change the base?

Prometheus workload loader #266

Conversation

LeaveMyYard commented Apr 22, 2024 • edited Loading

deutschj commented May 22, 2024

arikalon1 Jun 14, 2024

Choose a reason for hiding this comment

aantn Jun 14, 2024

Choose a reason for hiding this comment

arikalon1 Jun 14, 2024

Choose a reason for hiding this comment

aantn Jun 14, 2024

Choose a reason for hiding this comment

aantn commented Jun 14, 2024

aantn commented Jun 14, 2024

deutschj commented Jun 27, 2024

aantn commented Jun 28, 2024 • edited Loading

deutschj commented Jul 2, 2024 • edited Loading

Pionerd commented Jul 8, 2024

aantn commented Jul 8, 2024

wad-hongsumin commented Aug 12, 2024

arikalon1 commented Aug 12, 2024

wad-hongsumin commented Aug 12, 2024

LeaveMyYard commented Apr 22, 2024 •

edited

Loading

aantn commented Jun 28, 2024 •

edited

Loading

deutschj commented Jul 2, 2024 •

edited

Loading