HPA improvements #386

eero-t · 2024-09-03T16:31:39Z

Continuation of / depends on: #327

Description

Helm charts need fixes for HPA to work (with less changes) out of the box:

Increase probe timings to avoid scaled up CPU pods never getting to Ready state
Fix current (completely wrong) resource limit/request settings for inferencing components
Reduce default TGI scaling values to avoid issues when values are not properly configured
Change ChatQnA Helm chart to work with Helm installed version of Prometheus (WIP)
Related Helm README updates & fixes

Additionally PR also adds:

Support for enabling HPA-scaling in ChatQnA chart only for relevant subcharts

(Best reviewed by checking individual commits.)

Issues

n/a

Type of change

Bug fix (non-breaking change which fixes an issue)

Dependencies

n/a

Tests

Manual testing for the changes (before splitting them to separate commits & documenting them).

eero-t · 2024-09-03T16:49:35Z

Force pushed pre-commit CI fixes (as there were no review comments yet).

eero-t · 2024-09-03T16:54:49Z

If README.md HPA section is too long, it could be moved to its own HPA.md file.

lianhao · 2024-09-04T01:51:14Z

If README.md HPA section is too long, it could be moved to its own HPA.md file.

Yes, that would be a good idea.

eero-t · 2024-09-04T14:48:40Z

As to the generated manifests: https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector/config/HPA/

I think those should be thrown away rather than updated.

Using HPA with GMC when inferencing is done on CPU does not make much sense. GMC allows e.g. changing model at run-time, but when scaling the CPU services, it means that deployment resource requests etc should be updated accordingly (based on used model, data type used with it, TGI/TEI versions and underlying node resources), and I don't think GMC has any support for that.

(Using HPA with GMC for Gaudi accelerated services would be a bit different matter, but current HPA rules are just for CPU charts.)

irisdingbj · 2024-09-04T15:48:10Z

@eero-t To make HPA support in GMC is the target and @ichbinblau is working on this already. GMC allows to change the model dynamically but it will also support static pipiline.

eero-t · 2024-09-04T17:02:48Z

Not having used Helm before, I had not realized that gaudi-values.yaml overrides just specified parts from values.yaml file (in top-level and sub-charts), instead of replacing the values.yaml content completely. I.e. enabling HPA in values.yaml means it being enabled also when gaudi-values.yaml (or nv-values.yaml) is used.

This got me thinking...

If settings needed for scaling (resource requests corresponding to specified model, probe timings etc) were in separate files, user could enable HPA without needing to modify values.yaml files in top-level + sub-charts.

Enabling HPA would then look something like this, for CPU:

helm install chatqna ... -f chatqna/hpa-values.yaml -f common/tgi/cpu-values.yaml -f common/tei/cpu-values.yaml

And for Gaudi:

helm install chatqna ... -f chatqna/hpa-values.yaml -f chatqna/gaudi-values.yaml

Thoughts?

eero-t · 2024-09-04T17:21:10Z

Note: above does not fix the issue for (CPU) scaling, of scaled pods' resource requests needing to match their resource usage with the LLM model specified for them => overriding the model with a global Helm value, or by GMC, is not really a good idea as then there's a mismatch between those.

When Gaudi or GPU is used, that's less of a issue, as long as single device per Pod is OK. If larger models would require multiple devices, or smaller models would require (for better utilization / cost reasons) sharing GPUs between pods, then changing models would also be bad idea, if resource requests are not changed accordingly.

lianhao · 2024-09-05T01:21:28Z

Not having used Helm before, I had not realized that gaudi-values.yaml overrides just specified parts from values.yaml file (in top-level and sub-charts), instead of replacing the values.yaml content completely. I.e. enabling HPA in values.yaml means it being enabled also when gaudi-values.yaml (or nv-values.yaml) is used.

This got me thinking...

If settings needed for scaling (resource requests corresponding to specified model, probe timings etc) were in separate files, user could enable HPA without needing to modify values.yaml files in top-level + sub-charts.

Enabling HPA would then look something like this, for CPU:
helm install chatqna ... -f chatqna/hpa-values.yaml -f common/tgi/cpu-values.yaml -f common/tei/cpu-values.yaml
And for Gaudi:
helm install chatqna ... -f chatqna/hpa-values.yaml -f chatqna/gaudi-values.yaml
Thoughts?

In helm chart, values.yaml is alread used. Any addtional value files specified by -f <xxx.yaml> is just to add patch to the default values.yaml, not to replace it.

Since in component helm charts(those charts under common directory, HPA is disabled by default, i.e. in their values.yamlfiles HPA is set to false), there is no need to put HPA related settings in e2e helm chart's(i.e. chatqna) values.yaml file. What we need here is a separate hpa.yaml file in e2e helm chart, which enable the corresponding HPA for components and set default optimal configurations for the out-of-box e2e workload.

So helm install chatqna ... -f chatqna/hpa-values.yaml -f chatqna/gaudi-values.yaml seems a more propriate way to go, compared to helm install chatqna ... -f chatqna/hpa-values.yaml -f common/tgi/cpu-values.yaml -f common/tei/cpu-values.yaml. I don't think we should put separate hpa-values.yaml files in component helm chart, because it could be dependent on model and workload. So I would suggest put all the adjustable HPA related settings in hpa-values.yaml in top level e2e helm chart, in which we should also tell users to adjust its settings for their own model case if needed.

eero-t · 2024-09-05T13:59:01Z

Since in component helm charts(those charts under common directory, HPA is disabled by default, i.e. in their values.yamlfiles HPA is set to false), there is no need to put HPA related settings in e2e helm chart's(i.e. chatqna) values.yaml file.

If user does NOT use -f hpa-values.yaml, (disabled) HPA setting is still needed in values.yaml as Helm errors on custom metrics template.

Btw. Theresa just mentioned that Helm refuses to install PrometheusAdapter custom metrics config (for ChatQnA) over configMap that has been installed by another Helm chart, the Prometheus one... (I hadn't noticed this because I use Helm to generate manifest, and then apply that after reviewing the results, I don't install things directly with Helm. Also my Prometheus install comes from kube-prometheus manifests, not from Helm chart for that.)

=> I'm not sure how to best handle that for 1.0 milestone. I could test doing all installs directly with Helm and find some way to workaround any issues, but that will take at least few days. Another option is to document what I've actually tested ie. use Helm just to produce manifests, and installing those...

I would suggest put all the adjustable HPA related settings in hpa-values.yaml in top level e2e helm chart, in which we should also tell users to adjust its settings for their own model case if needed.

Because suitable subcomponent resource requests & probe timings differ between Gaudi and CPU (besides depending on model), and are needed for scaling to work in general (not just for HPA), I think they should be in separate file from generic HPA setup. I.e. top-level chart would have cpu-values.yaml that contains values that are relevant when subcharts use CPU, regardless of whether HPA is used:

ChatQnA TGI/TEI on CPU: -f cpu-values.yaml
- With HPA scaling: -f cpu-values.yaml -f hpa-values.yaml
ChatQnA TGI/TEI on Gaudi: -f gaudi-values.yaml
- With HPA scaling: -f gaudi-values.yaml -f hpa-values.yaml

Model should also be set in top-level chart device values file, not values.yaml because model (and data type used with it) dictates the resource usage, how much RAM it needs, does it fit into single device, or need to be sharded over multiple ones etc.

eero-t · 2024-09-05T18:12:24Z

FYI: I filed Prometheus-operator + Prometheus-adapter tickets on getting proper support for updating Prometheus-adapter, needed to properly fix custom metrics installation:

Update PrometheusAdapter config based on "MetricRules" objects prometheus-operator/kube-prometheus#2504
Adapter does not react to its config changes kubernetes-sigs/prometheus-adapter#678

That would allow reasonable automation for custom metrics, so I'd appreciate thumb-ups in the tickets. :-)

eero-t · 2024-09-05T19:38:56Z

I split HPA & CPU values to their own values files, and fixed merge conflict with main (which required force pushing). If those look fine, I can drop the Fix: commits, and last commit reverting them.

PS. In some later PR, extra variables can be added for HPA rules and TGI command line options, so that they match underlying HW better.

(Gaudi TGI can start much faster, so faster HPA scaling values could be used for Gaudi. There are Gaudi specific options that can provide significantly faster throughput for it, and I really hope there being also TGI options that can reduce its excruciatingly slow startup with CPU warmup. Newer CPUs could also use bfloat16 instead of float32, which would improve TGI perf a bit in addition to halving its memory usage.)

eero-t · 2024-09-05T19:52:57Z

Interestingly CI automatically started running tests for the new values files:

chatqna, xeon, cpu-values
chatqna, xeon, hpa-values

However, hpa-values requires Prometheus install first so that required API types are recognized. And on Xeon, while cpu-values works fine alone, hpa-values is supposed to be used only with cpu-values.

There was also unrelated (Gaudi guardrails) test failure due to:
FailedScheduling 4m40s (x2 over 10m) default-scheduler 0/2 nodes are available: 1 Insufficient habana.ai/gaudi

lianhao · 2024-09-06T02:05:07Z

Since in component helm charts(those charts under common directory, HPA is disabled by default, i.e. in their values.yamlfiles HPA is set to false), there is no need to put HPA related settings in e2e helm chart's(i.e. chatqna) values.yaml file.

If user does NOT use -f hpa-values.yaml, (disabled) HPA setting is still needed in values.yaml as Helm errors on custom metrics template.

Btw. Theresa just mentioned that Helm refuses to install PrometheusAdapter custom metrics config (for ChatQnA) over configMap that has been installed by another Helm chart, the Prometheus one... (I hadn't noticed this because I use Helm to generate manifest, and then apply that after reviewing the results, I don't install things directly with Helm. Also my Prometheus install comes from kube-prometheus manifests, not from Helm chart for that.)

=> I'm not sure how to best handle that for 1.0 milestone. I could test doing all installs directly with Helm and find some way to workaround any issues, but that will take at least few days. Another option is to document what I've actually tested ie. use Helm just to produce manifests, and installing those...

I would suggest put all the adjustable HPA related settings in hpa-values.yaml in top level e2e helm chart, in which we should also tell users to adjust its settings for their own model case if needed.

Because suitable subcomponent resource requests & probe timings differ between Gaudi and CPU (besides depending on model), and are needed for scaling to work in general (not just for HPA), I think they should be in separate file from generic HPA setup. I.e. top-level chart would have cpu-values.yaml that contains values that are relevant when subcharts use CPU, regardless of whether HPA is used:

ChatQnA TGI/TEI on CPU: -f cpu-values.yaml

With HPA scaling: -f cpu-values.yaml -f hpa-values.yaml

ChatQnA TGI/TEI on Gaudi: -f gaudi-values.yaml

With HPA scaling: -f gaudi-values.yaml -f hpa-values.yaml

Model should also be set in top-level chart device values file, not values.yaml because model (and data type used with it) dictates the resource usage, how much RAM it needs, does it fit into single device, or need to be sharded over multiple ones etc.

The top-leve values.yaml is meant for cpu cases without HPA. We want users to be able to install helm chart without applying addtional value files on any standard xeon servers if they want to have a try.

How about we do the followings:

# generic CPU cases without HPA:(this is equal to applying by -f values.yaml)
helm install chatqna chatqna --set global.HFTOKEN ...

# generic CPU cases with HPA:
helm install chatqna chatqna -f hpa-for-cpu-value.yaml --set global.HFTOKEN ...

# gaudi case without HPA
helm install chatqna chatqna -f gaudi-values.yaml --set global.HFTOKEN ...

# gaudi case with HPA
helm install chatqna chatqna -f guadi-values.yaml -f hpa-for-gaudi-values.yaml --set global.HFTOKEN...

As for the PrometheusAdapter custom metrics config, maybe we should have it documented here before prometheus-operator/kube-prometheus#2504 is resolved

lianhao · 2024-09-06T02:07:03Z

Interestingly CI automatically started running tests for the new values files:

chatqna, xeon, cpu-values

chatqna, xeon, hpa-values

However, hpa-values requires Prometheus install first so that required API types are recognized. And on Xeon, while cpu-values works fine alone, hpa-values is supposed to be used only with cpu-values.

There was also unrelated (Gaudi guardrails) test failure due to: FailedScheduling 4m40s (x2 over 10m) default-scheduler 0/2 nodes are available: 1 Insufficient habana.ai/gaudi

Please contact @daisy-ycguo about the CI to skip some value files in CI. We might need to come up a unified way of how to skip some cases, now we have nv-values.yaml to be skipped, and I expect more to come.

poussa · 2024-09-06T07:11:23Z

How about we do the followings:

# generic CPU cases without HPA:(this is equal to applying by -f values.yaml)
helm install chatqna chatqna --set global.HFTOKEN ...

# generic CPU cases with HPA:
helm install chatqna chatqna -f hpa-for-cpu-value.yaml --set global.HFTOKEN ...

# gaudi case without HPA
helm install chatqna chatqna -f gaudi-values.yaml --set global.HFTOKEN ...

# gaudi case with HPA
helm install chatqna chatqna -f guadi-values.yaml -f hpa-for-gaudi-values.yaml --set global.HFTOKEN...

As for the PrometheusAdapter custom metrics config, maybe we should have it documented here before prometheus-operator/kube-prometheus#2504 is resolved

This looks sane to me.

eero-t · 2024-09-06T18:28:11Z

As CPU values being in separate top-level chart (instead of subcharts) was fine, I dropped the resource & probe timing updates to common/*/values.yaml files, the commit reverting them, and rebased again to main. I.e. there were no code or doc changes in the rebase.

eero-t · 2024-09-06T19:33:37Z

The top-level values.yaml is meant for cpu cases without HPA.

I thought it was intended for Gaudi because it worked so badly on Xeon...

Without suitable probe timings and resources, out-of-the-box Xeon experience is rather awful, if user happens to run multiple different services, or does kubectl scale --replicas on the backends, regardless of whether HPA is used or not.

However, according to comments in TGI values.yaml file, OPEA does not want to specify resources by default, because somebody may want to try OPEA in a setup (minikube) that does not really have resources for it (I guess user could then switch to a smaller model, which would mean default resource values mismatching it).

Therefore I put CPU timings + resource values to a separate cpu-values.yaml file that.

Different CPU resource requests are needed for different models and different data types. I.e. if things should work out of the box, there need to be file per set of models, and data types specified for TGI & TEI, which in future could look something like this:

cpu-mistral7b-bgebase.yaml
cpu-some-other-models.yaml
etc

Note that because it's not possible to share Gaudi, case for that is simpler. One needs different resource spec only for models that do not fit into single Gaudi, but need to be sharded over several of them. If model needs to be split, only then CPU side resource usage can be relevant (due to CPU side DeepSpeed overhead).

We want users to be able to install helm chart without applying additional value files on any standard xeon servers if they want to have a try.

Extra file is needed for Gaudi and Nvidia, so it would be consistent to have one also for CPU.

# gaudi case with HPA
helm install chatqna chatqna -f guadi-values.yaml -f hpa-for-gaudi-values.yaml --set global.HFTOKEN...

Whether one uses Gaudis or CPUs is not a relevant distinction for HPA enabling and replica limits. If cluster has enough of Gaudis or CPUs, the same values can be used.

eero-t · 2024-09-06T20:52:14Z

Issues raised by Theresa when testing this PR, which must be fixed before merging:

Helm-installed Prometheus does not find ServiceMonitors
- => Add Chart values for Prometheus Helm install release name & namespace
Custom metrics config overwriting fails with Helm-installed Prometheus-adapter
- Did not find a way to handle that reasonable with OPEA Helm charts, and Prometheus upstream does not seem keen on doing anything with the adapter (except replace it)
- => decided it's best if Helm creates configMap which name does NOT match with Prometheus one, and README has instructions on how to copy that over Prometheus-adapter configMap (before user restarts adapter). That way it works same regardless of how Prometheus is installed

Other feedback from internal reviews:

Already merged HPA PR does not conform to Helm best practices (Add HPA support to ChatQnA #327)
- https://helm.sh/docs/chart_best_practices/templates/
- => rename custom metrics and HPA rule template files (+ check for empty values in them?)
Gaudi fine-tuning for HPA
- => Pre-existing resource requests in gaudi-values.yaml and probe timings in values.yaml files are probably fine for Gaudi, but would be good to verify
Support multiple separate HPA-scaled ChatQnA instances in same cluster
- => prefix custom metric names with ChatQnA instance name

Note: support for ChatQnA instances being in separate namespaces will definitely come only in some future PR. Fiddling with Prometheus RBAC rules is clearly out of scope for this one.

lianhao

This seems fine to me but we need to find out a way to unifying the naming of those helm value yaml files to meet the following scenarios, and make CICD to recognize those

value files for different workload feature variants: e.g. gaurdrails-values.yaml
value files for different optional feature: e.g. hpa-values.yaml,
value files for different vendor specific hw: e.g. gaudi-values.yaml
the combinations of the above 3

But this is another story, not related to HPA

And document need for specifying resources. Signed-off-by: Eero Tamminen <[email protected]>

Signed-off-by: Eero Tamminen <[email protected]>

https://helm.sh/docs/chart_best_practices/templates/ Signed-off-by: Eero Tamminen <[email protected]>

lianhao · 2024-09-10T03:14:47Z

I believe the 10 mins wait time for deployment ready(ROLLOUT_TIMEOUT_SECONDS) in helm CI test is too short for cpu-values.yaml configuration sometimes, because sometimes we have multiple CI tests running on the same node, especially for GMC

So that HPA scaling rules use custom metrics for correct set of TGI/TEI instances. Signed-off-by: Eero Tamminen <[email protected]>

* Use custom metrics configMap name that does NOT match one installed by Prometheus, because Helm does not allow overwriting object created by another Helm chart (like using manifests would). * Add Prometheus release name to serviceMonitors. Otherwise Helm installed Prometheus does not find serviceMonitors. Alternative would be using: prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false Reported by Theresa. Signed-off-by: Eero Tamminen <[email protected]>

eero-t · 2024-09-10T15:33:23Z

Helm-installed Prometheus does not find ServiceMonitors

Custom metrics config overwriting fails with Helm-installed Prometheus-adapter

Already merged HPA PR does not conform to Helm best practices

Support multiple separate HPA-scaled ChatQnA instances in same cluster

Pushed fixes for these issues, meaning:

release label for monitors
Custom metric install has now multiple steps, not just adapter restart
Renamed template files to best practices
Custom metrics have not chart specific prefix

Theresa is working on GMC support for HPA, so it could come after 1.0, but not as part of this PR.

Thank you @eero-t @ichbinblau would you confirm we need HPA manifest or not fot GMC to support HPA?

Discussed with Theresa, and I will be doing minimal manifest update (later today). I.e. fix non-HPA TGI/TEI deployments, and reduce HPA/ dir content to fixed HPA + custom metric rules and serviceMonitors. Deployment resource limits will come later for GMC.

Handling manifest generation properly for deployments required changing replica and terminationTimeout settings to depend on something else than HPA. Termination timeout really depends on whether pod is accelerated, not whether HPA is used, so there's no new accelDevice value for that (empty meaning CPU i.e. non-accelerated / slow).

Timeout is question of Pod slowness i.e. is it accelerated, not HPA. Replicas need to be set only when count is set to something else than the default value (1). This will work also for HPA, while making GMC manifest generation easier. Signed-off-by: Eero Tamminen <[email protected]>

How to install Prometheus with Helm and fix doc issues. Having a manual step that replaces existing PrometheusAdapter config with the generated one should work in all cases (it's a workaround for Helm refusing to overwrite object created by another chart). Signed-off-by: Eero Tamminen <[email protected]>

Instead of replacing deployments with HPA/ ones, apply fixes directly to the normal deployment manifests. K8s default is 1 replica, so that can be dropped, which works also better with HPA. Because resource requests are model specific, and GMC is used to change model, HPA/ manifests won't help with that, GMC needs to take care of that (eventually). Signed-off-by: Eero Tamminen <[email protected]>

And current Helm charts contents. Signed-off-by: Eero Tamminen <[email protected]>

eero-t · 2024-09-10T17:30:11Z

Merged pre-commit.ci changes to earlier commits, applied several fixes to the documentation and updated GMC manifests.

GMC manifest update was done manually, because current Helm charts do not support differentiating deployments enough based on whether it's an accelerated (= much faster) deployment. Somebody would need to change all deployment templates to apply proper timings based on whether CPU or accelerator (e.g. Gaudi) is used, similarly to how I did with termination timeouts.

However, that's a generic problem in the charts, so it belongs to some other PR, not into this v1.0 milestone one.

(While the problem is generic, its effect will be multiplied when number of pods is scaled, like happens with HPA.)

@irisdingbj is the PR now OK from your side?

Variable was left-over from "Fixes for Helm installed Prometheus version". Signed-off-by: Eero Tamminen <[email protected]>

Signed-off-by: Eero Tamminen <[email protected]>

lianhao

@eero-t Please see my embedded comments. @yongfengdu any comment on this?

lianhao · 2024-09-11T00:42:54Z

microservices-connector/config/manifests/tei.yaml

@@ -110,12 +110,14 @@ spec:
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
+            timeoutSeconds: 2


The manifest files are under microservices-connector/config/manifests directory is generated by helm-charts/update_manifests.sh automatically. So if you want to change the default probe settings, please change it in corresponding component's helm chart, i.e. helm-chart/common/tei/values.yaml. Otherwise, these values will be easily overwritten.

@lianhao Yes. Using cpu-values.yaml in ChatQnA papers over the (CPU) issues with current Helm charts, but a proper fix would be needed for generating the GMC manifests. However, that is not HPA specific problem, and requires more discussion[1][2] so IMHO it was clearly out of scope for this PR (although scaling makes these problems much more visible).

[1] What's IMHO needed to fix things for probe timings:

Common components' value files include different probe timing sections for CPU and for accelerators

Their deployment templates select one based on .Values.accelDevice value (empty for CPU)

All <device>-values.yaml files set appropriate <subchart>.accelDevice value (not just ChatQnA)

GMC device variant manifests are generated for all relevant components, not just TGI

(I don't think probe timings would need to be fine-tuned based on which model is used.)

[2] What's IMHO needed to fix resource requests:

Current sub-optimal component arguments are optimized, and resulting resource requirements are profiled, for all relevant models

For example on SPR, CPU TGI data type can be set to bfloat16, which halves its memory usage

Observability support + suitable Grafana dashboards will help with profiling

Instead of subcomponent model & corresponding resources being specified in top-level chart, helm install command uses suitable model+resource+args file from given component, like this:

-f common/tgi/gaudi/neural-chat-7b.yaml

-f common/teirerank/gaudi/bge-reranker-base.yaml

-f common/tei/cpu/bge-base.yaml

-f common/data-prep/gpu/values.yaml

(These would provide values with subchart prefix/heading so they can be used from top-level charts)

There could also be a global option for ignoring (CPU side) resource requests that can be used when things need to be re-profiled

GMC applies resource specs generated from these, when user changes model

If there are combinations of above which are common between different top-level charts, I would suggest Makefile merging relevant ones to common files (e.g. -f common/gaudi-model-defaults.yaml), to avoid duplicating things that may need to be updated whenever args, model, or image versions get updated.

@eero-t Thanks for the suggestion. I think we need further discussion about this. I created the draft PR for us to further discuss. Please discuss at #431

Sync the mismatch between helm chart and GMC manifests introduced by PR opea-project#386 Signed-off-by: Lianhao Lu <[email protected]>

Sync the mismatch between helm chart and GMC manifests introduced by PR #386 Signed-off-by: Lianhao Lu <[email protected]>

eero-t force-pushed the hpa-improvements branch 2 times, most recently from 1989441 to 3b451ba Compare September 3, 2024 16:47

eero-t requested review from yongfengdu and lianhao as code owners September 4, 2024 13:58

eero-t force-pushed the hpa-improvements branch from 518b519 to d096521 Compare September 4, 2024 14:00

kevinintel mentioned this pull request Sep 5, 2024

K8s Resource Management: autoscaling #144

Closed

kevinintel added the milestone1.0 label Sep 5, 2024

eero-t force-pushed the hpa-improvements branch 2 times, most recently from 656aee0 to cba8b18 Compare September 5, 2024 19:19

chensuyue added this to the v1.0 milestone Sep 6, 2024

eero-t force-pushed the hpa-improvements branch from cba8b18 to b43e058 Compare September 6, 2024 18:22

lianhao approved these changes Sep 9, 2024

View reviewed changes

eero-t added 4 commits September 10, 2024 09:53

Move HPA instructions to its own document

6a63cb1

And document need for specifying resources. Signed-off-by: Eero Tamminen <[email protected]>

Use separate hpa-values.yaml for enabling HPA

8266b66

Signed-off-by: Eero Tamminen <[email protected]>

Add separate cpu-values.yaml for CPU timings & resources

5075e5c

Signed-off-by: Eero Tamminen <[email protected]>

Rename HPA template files based on Helm best practices

2c11fe8

https://helm.sh/docs/chart_best_practices/templates/ Signed-off-by: Eero Tamminen <[email protected]>

lianhao force-pushed the hpa-improvements branch from 052b80d to 2c11fe8 Compare September 10, 2024 01:53

eero-t added 2 commits September 10, 2024 18:23

Use chart specific prefix for custom metrics

327ceb5

So that HPA scaling rules use custom metrics for correct set of TGI/TEI instances. Signed-off-by: Eero Tamminen <[email protected]>

eero-t added 4 commits September 10, 2024 19:47

GMC manifests: update HPA ones to match deployment/service ones

f46f733

And current Helm charts contents. Signed-off-by: Eero Tamminen <[email protected]>

eero-t force-pushed the hpa-improvements branch from 720538a to f46f733 Compare September 10, 2024 17:21

eero-t requested review from zhlsunshine and KfreeZ as code owners September 10, 2024 17:21

Pre-CI fixes + drop obsolete HPA variable from ChatQnA Chart

7eb3690

Variable was left-over from "Fixes for Helm installed Prometheus version". Signed-off-by: Eero Tamminen <[email protected]>

eero-t force-pushed the hpa-improvements branch from 938a0f7 to 7eb3690 Compare September 10, 2024 17:42

HPA doc example command fixes

fa490bc

Signed-off-by: Eero Tamminen <[email protected]>

irisdingbj approved these changes Sep 10, 2024

View reviewed changes

lianhao reviewed Sep 11, 2024

View reviewed changes

yongfengdu merged commit 8d86fff into opea-project:main Sep 11, 2024
20 checks passed

eero-t mentioned this pull request Sep 12, 2024

Document / support for using BFLOAT16 with (Xeon) TGI service opea-project/GenAIExamples#330

Closed

eero-t mentioned this pull request Sep 24, 2024

Support alternative metrics on accelerated TGI / TEI instances #454

Merged

1 task

lianhao added a commit to lianhao/GenAIInfra that referenced this pull request Sep 25, 2024

helm/manifest: Sync HPA related K8S probe settings

bef811c

Sync the mismatch between helm chart and GMC manifests introduced by PR opea-project#386 Signed-off-by: Lianhao Lu <[email protected]>

lianhao mentioned this pull request Sep 26, 2024

helm/manifest: Sync HPA related K8S probe settings #459

Merged

1 task

lianhao added a commit to lianhao/GenAIInfra that referenced this pull request Oct 8, 2024

helm/manifest: Sync HPA related K8S probe settings

b052180

Sync the mismatch between helm chart and GMC manifests introduced by PR opea-project#386 Signed-off-by: Lianhao Lu <[email protected]>

daisy-ycguo pushed a commit that referenced this pull request Oct 11, 2024

helm/manifest: Sync HPA related K8S probe settings (#459)

c399578

Sync the mismatch between helm chart and GMC manifests introduced by PR #386 Signed-off-by: Lianhao Lu <[email protected]>

eero-t deleted the hpa-improvements branch October 18, 2024 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPA improvements #386

HPA improvements #386

eero-t commented Sep 3, 2024 •

edited

Loading

eero-t commented Sep 3, 2024

eero-t commented Sep 3, 2024

lianhao commented Sep 4, 2024

eero-t commented Sep 4, 2024

irisdingbj commented Sep 4, 2024

eero-t commented Sep 4, 2024

eero-t commented Sep 4, 2024 •

edited

Loading

lianhao commented Sep 5, 2024

eero-t commented Sep 5, 2024

eero-t commented Sep 5, 2024 •

edited

Loading

eero-t commented Sep 5, 2024

eero-t commented Sep 5, 2024 •

edited

Loading

lianhao commented Sep 6, 2024

lianhao commented Sep 6, 2024

poussa commented Sep 6, 2024

eero-t commented Sep 6, 2024 •

edited

Loading

eero-t commented Sep 6, 2024 •

edited

Loading

eero-t commented Sep 6, 2024

lianhao left a comment •

edited

Loading

lianhao commented Sep 10, 2024 •

edited

Loading

eero-t commented Sep 10, 2024

eero-t commented Sep 10, 2024 •

edited

Loading

lianhao left a comment

lianhao Sep 11, 2024

eero-t Sep 11, 2024

lianhao Sep 13, 2024 •

edited

Loading

HPA improvements #386

HPA improvements #386

Conversation

eero-t commented Sep 3, 2024 • edited Loading

Description

Issues

Type of change

Dependencies

Tests

eero-t commented Sep 3, 2024

eero-t commented Sep 3, 2024

lianhao commented Sep 4, 2024

eero-t commented Sep 4, 2024

irisdingbj commented Sep 4, 2024

eero-t commented Sep 4, 2024

eero-t commented Sep 4, 2024 • edited Loading

lianhao commented Sep 5, 2024

eero-t commented Sep 5, 2024

eero-t commented Sep 5, 2024 • edited Loading

eero-t commented Sep 5, 2024

eero-t commented Sep 5, 2024 • edited Loading

lianhao commented Sep 6, 2024

lianhao commented Sep 6, 2024

poussa commented Sep 6, 2024

eero-t commented Sep 6, 2024 • edited Loading

eero-t commented Sep 6, 2024 • edited Loading

eero-t commented Sep 6, 2024

lianhao left a comment • edited Loading

Choose a reason for hiding this comment

lianhao commented Sep 10, 2024 • edited Loading

eero-t commented Sep 10, 2024

eero-t commented Sep 10, 2024 • edited Loading

lianhao left a comment

Choose a reason for hiding this comment

lianhao Sep 11, 2024

Choose a reason for hiding this comment

eero-t Sep 11, 2024

Choose a reason for hiding this comment

lianhao Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

eero-t commented Sep 3, 2024 •

edited

Loading

eero-t commented Sep 4, 2024 •

edited

Loading

eero-t commented Sep 5, 2024 •

edited

Loading

eero-t commented Sep 5, 2024 •

edited

Loading

eero-t commented Sep 6, 2024 •

edited

Loading

eero-t commented Sep 6, 2024 •

edited

Loading

lianhao left a comment •

edited

Loading

lianhao commented Sep 10, 2024 •

edited

Loading

eero-t commented Sep 10, 2024 •

edited

Loading

lianhao Sep 13, 2024 •

edited

Loading