Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaling selenium grid on kubernetes with scaledjobs #1854

Merged
merged 7 commits into from
Jun 29, 2023

Conversation

msvticket
Copy link
Contributor

@msvticket msvticket commented May 23, 2023

Description

This PR builds on #1714 but adds a few features:

  • KEDA can be installed automatically
  • Support for scaling using ScaledJobs (which is the default when autoscaling is enabled)
  • Set environment variables SE_NODE_GRID_URL and DRAIN_AFTER_SESSION_COUNT automatically
  • Set graphqlurl automatically based on configuration
  • Refactor out pod templates to named template
  • Conditionally adding preStop hook for draining node on termination

Motivation and Context

Compared to #1714 I want to make it easier to get started with autoscaling. The basic setup should be achieved with just setting autoscaling.enabled to true.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@msvticket msvticket marked this pull request as ready for review May 23, 2023 15:47
@msvticket
Copy link
Contributor Author

To test the selenium-grid helm chart with the contribution of this PR you can get it from the repo https://jenkins-x-charts.github.io/repo. The latest version as of now is 4.12.0.

@msvticket msvticket changed the title Kubernetes dynamic grid with scaledjobs Autoscaling selenium grid on kubernetes with scaledjobs May 24, 2023
prashanth-volvocars and others added 3 commits May 25, 2023 18:45
Autoscale selenium browser nodes running in kubernetes
based on the request pending in session queue using KEDA.
Toggle autoscaling on/off using 'autoscalingEnabled' option
in helm charts.
and make them the default KEDA scaling type
install keda automatically
set SE_NODE_GRID_URL and DRAIN_AFTER_SESSION_COUNT automatically
Set graphqlurl automatically
refactor out pod templates to named template
conditionally adding preStop hook for deregistering node

Signed-off-by: Mårten Svantesson <[email protected]>
@win5923
Copy link

win5923 commented May 26, 2023

When setting 'autoscaling.enabled' to true, KEDA is not automatically installed.

Error: UPGRADE FAILED: [resource mapping not found for name: "selenium-chrome-node" namespace: "devops-tools" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first, resource mapping not found for name: "selenium-edge-node" namespace: "devops-tools" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first, resource mapping not found for name: "selenium-firefox-node" namespace: "devops-tools" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first]

@msvticket
Copy link
Contributor Author

When setting 'autoscaling.enabled' to true, KEDA is not automatically installed.

Error: UPGRADE FAILED: [resource mapping not found for name: "selenium-chrome-node" namespace: "devops-tools" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first, resource mapping not found for name: "selenium-edge-node" namespace: "devops-tools" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first, resource mapping not found for name: "selenium-firefox-node" namespace: "devops-tools" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first]

Well, it works for me. What are your steps for installing selenium-grid?

@win5923
Copy link

win5923 commented May 26, 2023

install helm chart and get value.

helm repo add myrepo https://jenkins-x-charts.github.io/repo/
helm install selenium-grid myrepo/selenium-grid
helm get values selenium-grid -a > value.yaml

change "autoscaling.enabled" to true and upgrade.

helm upgrade selenium-grid myrepo/selenium-grid -f value.yaml

@win5923
Copy link

win5923 commented May 29, 2023

install helm chart and get value.

helm repo add myrepo https://jenkins-x-charts.github.io/repo/
helm install selenium-grid myrepo/selenium-grid
helm get values selenium-grid -a > value.yaml

change "autoscaling.enabled" to true and upgrade.

helm upgrade selenium-grid myrepo/selenium-grid -f value.yaml

sorry this is my new error when autoscaling is enabled, Is something wrong with my steps?

Error: UPGRADE FAILED: template: selenium-grid/templates/firefox-node-deployment.yaml:41:19: executing "selenium-grid/templates/firefox-node-deployment.yaml" at <tpl (toYaml .) $>: error calling tpl: error during tpl function execution for "- name: DRAIN_AFTER_SESSION_COUNT\n  value: '{{- and (eq (include \"seleniumGrid.useKEDA\" .) \"true\") (eq .Values.autoscaling.scalingType\n    \"job\") | ternary \"1\" \"0\" -}}'\n- name: SE_NODE_GRID_URL\n  value: '{{ include \"seleniumGrid.url\" .}}'": template: selenium-grid/templates/firefox-node-deployment.yaml:2:23: executing "selenium-grid/templates/firefox-node-deployment.yaml" at <include "seleniumGrid.useKEDA" .>: error calling include: template: no template "seleniumGrid.useKEDA" associated with template "gotpl"

@msvticket
Copy link
Contributor Author

Well, I'm not really familiar with using the helm command directly to install stuff. I have always used a declarative approach through Jenkins X. Others use similar tools like ArgoCD. These tools work around the problems with handling CRDs that the helm commands have.

That said, it's not that strange that it doesn't work since you use the upgrade command when you want a chart to be installed. Had you added --set autoscaling.enabled=true to the install command it would have worked.
For some more details see:

https://helm.sh/docs/chart_best_practices/custom_resource_definitions/#method-1-let-helm-do-it-for-you

As for the error no template "seleniumGrid.useKEDA" associated with template "gotpl" I don't get that at all. Could have to do with the version of helm? What are you using?

@win5923
Copy link

win5923 commented May 31, 2023

I understand what you said, and I used 'helm install' with --set autoscaling.enabled=true, but the issue still remains about CRDs.

helm version: v3.11.3

helm install selenium-grid myrepo/selenium-grid -n devops-tools --set autoscaling.enabled=true
Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: [resource mapping not found for name: "selenium-chrome-node" namespace: "devops-tools" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first, resource mapping not found for name: "selenium-edge-node" namespace: "devops-tools" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first, resource mapping not found for name: "selenium-firefox-node" namespace: "devops-tools" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first]

helm seem to normally install subchart after current chart so trying this work around
@msvticket
Copy link
Contributor Author

I understand what you said, and I used 'helm install' with --set autoscaling.enabled=true, but the issue still remains about CRDs.

helm version: v3.11.3

helm install selenium-grid myrepo/selenium-grid -n devops-tools --set autoscaling.enabled=true
Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: [resource mapping not found for name: "selenium-chrome-node" namespace: "devops-tools" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first, resource mapping not found for name: "selenium-edge-node" namespace: "devops-tools" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first, resource mapping not found for name: "selenium-firefox-node" namespace: "devops-tools" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first]

Ah, there seem to be another limitation in helm: Sub charts are installed after the main chart. But there is a way to delay the installation of resources. I have added this now, so if you do helm repo update and then try again it should work.

@msvticket
Copy link
Contributor Author

While working on support for video recording I noticed how impractical it was that I had put default values in extraEnvironmentVariables. That means that things will break when overriding extraEnvironmentVariables without setting all values. I'll change that.

Copy link
Member

@diemol diemol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this, @msvticket!

@diemol
Copy link
Member

diemol commented Jun 29, 2023

Due to the previous conflicts between this PR and trunk and how the updates have been propagated, seems quite complicated to merge the commit history as it is.

I will squash and merge, and this will keep the authors there.

Also, thank you, @prashanth-volvocars!

@diemol diemol merged commit f0bbfe0 into SeleniumHQ:trunk Jun 29, 2023
@DeepKandey
Copy link

@msvticket Tried to install Helm with autoscaling enabled. Keda is not getting installed.

E0718 09:52:28.784838 25660 memcache.go:287] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0718 09:52:28.902582 25660 memcache.go:121] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Error: INSTALLATION FAILED: failed post-install: unable to build kubernetes object for deleting hook selenium-grid/templates/chrome-node-scaledjobs.yaml: resource mapping not found for name: "selenium-chrome-node" namespace: "default" from "": no matches for kind "ScaledJob" in version "keda.sh/v1alpha1"
ensure CRDs are installed first

I think the workaround is still not working
image

@DeepKandey
Copy link

Hey, can someone please help me with this issue?

@win5923
Copy link

win5923 commented Aug 1, 2023

run kubectl get apiservice v1beta1.external.metrics.k8s.io -o yaml to see what's wrong with the service.

@DeepKandey
Copy link

Got below response @win5923

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
annotations:
meta.helm.sh/release-name: selenium-grid
meta.helm.sh/release-namespace: default
creationTimestamp: "2023-07-18T04:22:28Z"
labels:
app.kubernetes.io/component: operator
app.kubernetes.io/instance: selenium-grid
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: v1beta1.external.metrics.k8s.io
app.kubernetes.io/part-of: keda-operator
app.kubernetes.io/version: 2.10.1
helm.sh/chart: keda-2.10.2
name: v1beta1.external.metrics.k8s.io
resourceVersion: "217776"
uid: b49b2159-07fd-49b1-8f41-992d827bb6ab
spec:
caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURFRENDQWZpZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFoTVJBd0RnWURWUVFLRXdkTFJVUkIKVDFKSE1RMHdDd1lEVlFRREV3UkxSVVJCTUI0WERUSXpNRGN4T0RBek1qRXlOVm9YRFRNek1EY3hOVEEwTWpFeQpOVm93SVRFUU1BNEdBMVVFQ2hNSFMwVkVRVTlTUnpFTk1Bc0dBMVVFQXhNRVMwVkVRVENDQVNJd0RRWUpLb1pJCmh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTkhZK3YvTU5KRmsxRmhyOEF5ZW9lN3FOVXlGWjE5Z3BrTTEKa2tDcXRYRTlwWG5ybk9kaGFLYnFjVXRGOGFETnNlZkozTHdOTHBPdWdQNXJHTWdWb1RKeStsT010TEI2anhNSApBK2pOMHZtR0JjdkQ0NlkxU1ZNOS85TE5hWndXT3Yzb0Q2THhPaXRDNHliMVNLaHdUZFVWVzl1VGNjcmNTbERWCjBhajJDY044TGhmc2sxdldNUXdHb1l4UUFOYVBaNWlBNXdmWFBsb08wRlVaZC9jVU1jOUJ3Q0llTGZHRDZ5djIKSUg1VitFQVRTSmFqZWw4dU9JYW1TeTZpMlJhb3krR2F2MkxrQ0Y0Y0xidXIvT1g3bkpUUzRRbWNNbkREUDVMVgpwU2U4MkhrK1FMOTN2L0FjKzFkOENSYzhWcVJjcE9lSUFJV055U3ZMY3czKzVXSnZYR2tDQXdFQUFhTlRNRkV3CkRnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRFZSME9CQllFRkhJN1dQN3UKandxTjNFa3U2UHBETFlLcENySE1NQThHQTFVZEVRUUlNQWFDQkV0RlJFRXdEUVlKS29aSWh2Y05BUUVMQlFBRApnZ0VCQUtWTjd0V0FIT2k5Vk1odDdtSmhTMlU2ZG9zcVQ4aXRoTWZhcElOa2lNVzZ0U1lVT2xYTlB6akhBWkF3Ci91RVhTMFRrbzQyYXRRaVk4Tit6RHhNaHFYRkl6aEZBdXVmdkljQU5LRGNuVGxaQTllMzhLOVVlMjE5MDlvVWoKQkhGUnN5Nk5wWG9DMlBON0ZpaENYZTZWNWxjTktjdFp1a3Q1WUU5SHFBSzFQNWJtME9zOEVzNlVYdGhlaHJHSwpzb1NCS2x5RUQ3anJuSEFyMGw1R1Vtc3d5OE5XKzF2aG45NGROMWl1emUxUUFUbkp1MCtvZjZuODRPOEtSM0hqClBEdGFWbVVadlhjdzc0MDJHMXZEQ0MzMHdUOGJJcXBMS3k1d0VxOTFLYXRoOVVPUHQ1MXhSbUxHK0hzQ1loZGgKYnR3OHkzZ3djVWtKblc1M0tmeVF0YmF1RElJPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==

group: external.metrics.k8s.io
groupPriorityMinimum: 100
service:
name: keda-operator-metrics-apiserver
namespace: default
port: 443
version: v1beta1
versionPriority: 100
status:
conditions:

  • lastTransitionTime: "2023-08-01T16:07:02Z"
    message: all checks passed
    reason: Passed
    status: "True"
    type: Available

@win5923
Copy link

win5923 commented Aug 1, 2023

The issue is not with the APIService. it might be a problem with your cluster network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants