Skip to content
This repository has been archived by the owner on Sep 21, 2021. It is now read-only.

Readiness probe failed: HTTP probe failed with statuscode: 502 #603

Closed
bramvdklinkenberg opened this issue Jun 1, 2018 · 20 comments
Closed

Comments

@bramvdklinkenberg
Copy link

Hi,

I am trying to deploy the zalenium helm chart in my newly deployed aks kuberbetes (1.9.6) cluster in Azure. But I don't get it to work. The pod is giving the log below:

[bram@xforce zalenium]$ kubectl logs -f zalenium-zalenium-hub-6bbd86ff78-m25t2 Kubernetes service account found. Copying files for Dashboard... cp: cannot create regular file '/home/seluser/videos/index.html': Permission denied cp: cannot create directory '/home/seluser/videos/css': Permission denied cp: cannot create directory '/home/seluser/videos/js': Permission denied Starting Nginx reverse proxy... Starting Selenium Hub... ..........08:49:14.052 [main] INFO o.o.grid.selenium.GridLauncherV3 - Selenium build info: version: '3.12.0', revision: 'unknown' 08:49:14.120 [main] INFO o.o.grid.selenium.GridLauncherV3 - Launching Selenium Grid hub on port 4445 ...08:49:15.125 [main] INFO d.z.e.z.c.k.KubernetesContainerClient - Initialising Kubernetes support ..08:49:15.650 [main] WARN d.z.e.z.c.k.KubernetesContainerClient - Error initialising Kubernetes support. io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [zalenium-zalenium-hub-6bbd86ff78-m25t2] in namespace: [default] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:62) at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:71) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:206) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162) at de.zalando.ep.zalenium.container.kubernetes.KubernetesContainerClient.<init>(KubernetesContainerClient.java:87) at de.zalando.ep.zalenium.container.ContainerFactory.createKubernetesContainerClient(ContainerFactory.java:35) at de.zalando.ep.zalenium.container.ContainerFactory.getContainerClient(ContainerFactory.java:22) at de.zalando.ep.zalenium.proxy.DockeredSeleniumStarter.<clinit>(DockeredSeleniumStarter.java:59) at de.zalando.ep.zalenium.registry.ZaleniumRegistry.<init>(ZaleniumRegistry.java:74) at de.zalando.ep.zalenium.registry.ZaleniumRegistry.<init>(ZaleniumRegistry.java:62) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at org.openqa.grid.web.Hub.<init>(Hub.java:93) at org.openqa.grid.selenium.GridLauncherV3$2.launch(GridLauncherV3.java:291) at org.openqa.grid.selenium.GridLauncherV3.launch(GridLauncherV3.java:122) at org.openqa.grid.selenium.GridLauncherV3.main(GridLauncherV3.java:82) Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname kubernetes.default.svc not verified: certificate: sha256/OyzkRILuc6LAX4YnMAIGrRKLmVnDgLRvCasxGXDhSoc= DN: CN=client, O=system:masters subjectAltNames: [10.0.0.1] at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:308) at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:268) at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:160) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:256) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:134) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:113) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:125) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:56) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:107) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200) at okhttp3.RealCall.execute(RealCall.java:77) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:379) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:313) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:296) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:770) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:195) ... 16 common frames omitted 08:49:15.651 [main] INFO d.z.e.z.c.k.KubernetesContainerClient - About to clean up any left over selenium pods created by Zalenium Usage: <main class> [options] Options: --debug, -debug <Boolean> : enables LogLevel.FINE. Default: false --version, -version Displays the version and exits. Default: false -browserTimeout <Integer> in seconds : number of seconds a browser session is allowed to hang while a WebDriver command is running (example: driver.get(url)). If the timeout is reached while a WebDriver command is still processing, the session will quit. Minimum value is 60. An unspecified, zero, or negative value means wait indefinitely. -matcher, -capabilityMatcher <String> class name : a class implementing the CapabilityMatcher interface. Specifies the logic the hub will follow to define whether a request can be assigned to a node. For example, if you want to have the matching process use regular expressions instead of exact match when specifying browser version. ALL nodes of a grid ecosystem would then use the same capabilityMatcher, as defined here. -cleanUpCycle <Integer> in ms : specifies how often the hub will poll running proxies for timed-out (i.e. hung) threads. Must also specify "timeout" option -custom <String> : comma separated key=value pairs for custom grid extensions. NOT RECOMMENDED -- may be deprecated in a future revision. Example: -custom myParamA=Value1,myParamB=Value2 -host <String> IP or hostname : usually determined automatically. Most commonly useful in exotic network configurations (e.g. network with VPN) Default: 0.0.0.0 -hubConfig <String> filename: a JSON file (following grid2 format), which defines the hub properties -jettyThreads, -jettyMaxThreads <Integer> : max number of threads for Jetty. An unspecified, zero, or negative value means the Jetty default value (200) will be used. -log <String> filename : the filename to use for logging. If omitted, will log to STDOUT -maxSession <Integer> max number of tests that can run at the same time on the node, irrespective of the browser used -newSessionWaitTimeout <Integer> in ms : The time after which a new test waiting for a node to become available will time out. When that happens, the test will throw an exception before attempting to start a browser. An unspecified, zero, or negative value means wait indefinitely. Default: 600000 -port <Integer> : the port number the server will use. Default: 4445 -prioritizer <String> class name : a class implementing the Prioritizer interface. Specify a custom Prioritizer if you want to sort the order in which new session requests are processed when there is a queue. Default to null ( no priority = FIFO ) -registry <String> class name : a class implementing the GridRegistry interface. Specifies the registry the hub will use. Default: de.zalando.ep.zalenium.registry.ZaleniumRegistry -role <String> options are [hub], [node], or [standalone]. Default: hub -servlet, -servlets <String> : list of extra servlets the grid (hub or node) will make available. Specify multiple on the command line: -servlet tld.company.ServletA -servlet tld.company.ServletB. The servlet must exist in the path: /grid/admin/ServletA /grid/admin/ServletB -timeout, -sessionTimeout <Integer> in seconds : Specifies the timeout before the server automatically kills a session that hasn't had any activity in the last X seconds. The test slot will then be released for another test to use. This is typically used to take care of client crashes. For grid hub/node roles, cleanUpCycle must also be set. -throwOnCapabilityNotPresent <Boolean> true or false : If true, the hub will reject all test requests if no compatible proxy is currently registered. If set to false, the request will queue until a node supporting the capability is registered with the grid. -withoutServlet, -withoutServlets <String> : list of default (hub or node) servlets to disable. Advanced use cases only. Not all default servlets can be disabled. Specify multiple on the command line: -withoutServlet tld.company.ServletA -withoutServlet tld.company.ServletB org.openqa.grid.common.exception.GridConfigurationException: Error creating class with de.zalando.ep.zalenium.registry.ZaleniumRegistry : null at org.openqa.grid.web.Hub.<init>(Hub.java:97) at org.openqa.grid.selenium.GridLauncherV3$2.launch(GridLauncherV3.java:291) at org.openqa.grid.selenium.GridLauncherV3.launch(GridLauncherV3.java:122) at org.openqa.grid.selenium.GridLauncherV3.main(GridLauncherV3.java:82) Caused by: java.lang.ExceptionInInitializerError at de.zalando.ep.zalenium.registry.ZaleniumRegistry.<init>(ZaleniumRegistry.java:74) at de.zalando.ep.zalenium.registry.ZaleniumRegistry.<init>(ZaleniumRegistry.java:62) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at org.openqa.grid.web.Hub.<init>(Hub.java:93) ... 3 more Caused by: java.lang.NullPointerException at java.util.TreeMap.putAll(TreeMap.java:313) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.withLabels(BaseOperation.java:411) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.withLabels(BaseOperation.java:48) at de.zalando.ep.zalenium.container.kubernetes.KubernetesContainerClient.deleteSeleniumPods(KubernetesContainerClient.java:393) at de.zalando.ep.zalenium.container.kubernetes.KubernetesContainerClient.initialiseContainerEnvironment(KubernetesContainerClient.java:339) at de.zalando.ep.zalenium.container.ContainerFactory.createKubernetesContainerClient(ContainerFactory.java:38) at de.zalando.ep.zalenium.container.ContainerFactory.getContainerClient(ContainerFactory.java:22) at de.zalando.ep.zalenium.proxy.DockeredSeleniumStarter.<clinit>(DockeredSeleniumStarter.java:59) ... 11 more ...........................................................................................................................................................................................GridLauncher failed to start after 1 minute, failing... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 182 100 182 0 0 36103 0 --:--:-- --:--:-- --:--:-- 45500

A describe pod gives:
Warning Unhealthy 4m (x12 over 6m) kubelet, aks-agentpool-93668098-0 Readiness probe failed: HTTP probe failed with statuscode: 502

Zalenium Image Version(s):
dosel/zalenium:3

If using Kubernetes, specify your environment, and if relevant your manifests:
I use the templates as is from https://github.com/zalando/zalenium/tree/master/docs/k8s/helm

Expected Behavior -

The zalenium pods to run

Actual Behavior - See above

@bramvdklinkenberg
Copy link
Author

bramvdklinkenberg commented Jun 1, 2018

I guess it has to do something with rbac because of this part
"Error initialising Kubernetes support. io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [zalenium-zalenium-hub-6bbd86ff78-m25t2] in namespace: [default] failed. at "

I created a clusterrole and clusterrolebinding for the service account zalenium-zalenium that is automatically created by the helm chart.

kubectl create clusterrole zalenium --verb=get,list,watch,update,delete,create,patch --resource=pods,deployments,secrets

kubectl create clusterrolebinding zalenium --clusterrole=zalenium --serviceaccount=zalenium-zalenium --namespace=default

@pearj
Copy link
Collaborator

pearj commented Jun 2, 2018

This problem:
cp: cannot create regular file '/home/seluser/videos/index.html': Permission denied cp: cannot create directory '/home/seluser/videos/css': Permission denied cp: cannot create directory '/home/seluser/videos/js':
Is because you need to mount a volume at /home/seluser/videos.

Regarding the role and rolebinding, take a look at:

https://github.com/zalando/zalenium/blob/master/docs/k8s/gke/plumbing.yaml

I think whoever contributed the helm chart wasn't using a cluster that had RBAC enabled.

@bramvdklinkenberg
Copy link
Author

bramvdklinkenberg commented Jun 2, 2018

@pearj I deployed it on an AKS cluster which has rbac disabled (for now).
even with the clusterrole and clusterrolebinding given in the plumbing yaml I still get the same error.

Instead of the helm deployment I also tried the seperate yaml files... i created a clusterrole and clusterrolebinding but same error.

@bramvdklinkenberg
Copy link
Author

locally with minikube I get it to work when I create a clusterrolebinding of the zalenium-zalenium serviceaccount with the clusterrole cluster-admin.

@bramvdklinkenberg
Copy link
Author

I deployed the application the exact same way on an ACS cluster (Azure) and it works.

@diemol
Copy link
Contributor

diemol commented Jun 5, 2018

Do you know what the difference is between ACS and AKS?

Besides that, I am not sure how to help, and it actually might be the first one trying to deploy Zalenium en AKS :) So I hope you get some success there and perhaps you can help us to improve the docs!

@bramvdklinkenberg
Copy link
Author

The differences shouldn't be much. AKS is a Kubernetes PaaS solution on Azure and ACS is also a Kubernetes service but more IaaS. Both don't have RBAC enabled. Only difference k8s wise is that ACS is running 1.7.7 and AKS is running 1.9.6.
Going to test if zalenium works in AKS with version 1.7.7 works or not.

@bramvdklinkenberg
Copy link
Author

It works with AKS and k8s version 1.7.7.... but it also works with minikube and k8s version 1.10.0...
bit lost at the moment why it doesn't work with aks and k8s v1.9.6 and higher.
going to dive into it.

@pearj
Copy link
Collaborator

pearj commented Jun 6, 2018

Maybe you’re not allowed to create cluster role bindings in aks? Only role bindings?
Zalenium doesn’t specifically need a cluster role binding. You could grant the admin role for the namespace to the service account.

@bramvdklinkenberg
Copy link
Author

I can create clusterrolebindings, but that shouldn't be the issue since rbac is not enabled on AKS (or ACS).
I can just do a helm install of the chart without having to do anything with clusterrolebindings.
But only on AKS with k8s v1.9.6 it doesn't work.

@pearj
Copy link
Collaborator

pearj commented Jun 6, 2018

@bramvdklinkenberg Looks like your error is:
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname kubernetes.default.svc not verified: certificate: sha256/OyzkRILuc6LAX4YnMAIGrRKLmVnDgLRvCasxGXDhSoc= DN: CN=client, O=system:masters subjectAltNames: [10.0.0.1] at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:308) at
Which is kinda weird, because the kubernetes client can normally find the k8s ca certificate automatically. unless v1.9.6 puts the ca cert in a different location?
Regardless, if you know where it is ending up on disk you can specify some environment variables that override the default kubernetes-client behaviour, see:
https://github.com/fabric8io/kubernetes-client/#configuring-the-client
It is probably the KUBERNETES_CERTS_CA_FILE environment variable you're after.

@pearj
Copy link
Collaborator

pearj commented Jun 6, 2018

It actually kinda looks like certificate that the kubernetes api server is giving you is actually wrong. Maybe file a bug with microsoft?

The subject alt name is: 10.0.0.1, instead of kubernetes.default.svc

Maybe it's worth setting KUBERNETES_MASTER=https://10.0.0.1, as kubernetes.default.svc appears to have a broken certificate

@pearj
Copy link
Collaborator

pearj commented Jun 6, 2018

Here's the issue: Azure/AKS#399

@bramvdklinkenberg
Copy link
Author

@pearj thanks! I will also create a support request in the Azure portal and refer to the github issue(s).

@pearj
Copy link
Collaborator

pearj commented Jun 6, 2018

Looks like the next release of kubernetes-client contains a patch that will use the KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT environment variables instead of defaulting to kubernetes.default.svc.

However, in the meantime I'm pretty sure if you set KUBERNETES_MASTER=https://10.0.0.1:443 as an environment variable on the zalenium container, that will fix your problem too.

@bramvdklinkenberg
Copy link
Author

I added the KUBERNETS_MASTER env to the chart and redeployed the chart. it works now!

- name: KUBERNETES_MASTER
  value: https://10.0.0.1:443

@diemol
Copy link
Contributor

diemol commented Jun 11, 2018

That's great @bramvdklinkenberg!

Thanks @pearj for all the troubleshooting :)

Closing this issue.

@diemol diemol closed this as completed Jun 11, 2018
@bramvdklinkenberg
Copy link
Author

Latest comment in issue Azure AKS 399:
"We identified the bug. This impacts AKS clusters with newer infrastructure feature. We will update here once the rollout is completed"

@WFTesterMikeB
Copy link

Hi there - Apologies to hijack this thread, but heard you have been working on getting Zalanium working with Kuberentes on Azure...

We have a selenium grid working on Kubernetes, but wanted to get Zalaneium working - are you able to share how this should work? (in light of the bug mentioned above).

This is the sequence of commands we currently use to bring up Kuberenetes having created the resource group already via the interface;

az aks get-credentials --resource-group XXX--name XXXX
kubectl run XXX --image selenium/hub:3.11.0 --port 4444
kubectl expose deployment XXXX --type=LoadBalancer --name=selenium-hub
kubectl get service selenium-hub --watch
kubectl run selenium-node-chrome --image selenium/node-chrome:3.11.0 --env="HUB_PORT_4444_TCP_ADDR=selenium-hub" --env="HUB_PORT_4444_TCP_PORT=4444"
kubectl scale deployment selenium-node-chrome --replicas=XX

az aks browse --resource-group XXX --name XXXX

Obviously the documentation for Zalenium gives docker commands to use with miniKube, to work locally, so unsure on how to get them to work on Azure/Cloud with Kuberentes?

Any help or suggestions would be valued.

Cheers

M

@bramvdklinkenberg
Copy link
Author

bramvdklinkenberg commented Jun 25, 2018

@WFTesterMikeB , the issue I had is solved. That was an AKS/Kubernetes issue.
I deployed it using helm.

With the --set command or the values.yaml you can set specific configuration for your zalenium deployment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants