-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support HTTP operation retry with exponential backoff (for status code >= 500) #3087
Comments
I am working on the PR. But has no right to assign the issue to myself. |
Thanks @rohanKanojia! |
Which operations would be affected by this change? |
I would do this in |
Regarding that method it would be this elegant: - Response response = client.newCall(request).execute();
+ Response response = retryWithExponentialBackoff(client, request); |
11 tasks
HyukjinKwon
pushed a commit
to apache/spark
that referenced
this issue
Jul 7, 2021
### What changes were proposed in this pull request? Upgrading the kubernetes-client to 5.5.0 ### Why are the changes needed? There are [several bugfixes](https://github.com/fabric8io/kubernetes-client/releases/tag/v5.5.0) but the main reason is version 5.5.0 contains [Support HTTP operation retry with exponential backoff (for status code >= 500)](fabric8io/kubernetes-client#3087). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the integration tests including `persistentVolume` tests: ``` ./resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh \ --spark-tgz $TARBALL_TO_TEST --hadoop-profile $HADOOP_PROFILE --exclude-tags r --include-tags persistentVolume ... [INFO] --- scalatest-maven-plugin:2.0.0:test (integration-test) spark-kubernetes-integration-tests_2.12 --- Discovery starting. Discovery completed in 413 milliseconds. Run starting. Expected test count is: 26 KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - All pods have the same service account by default - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark to test a pyfiles example - Run PySpark with memory customization - Run in client mode. - Start pod creation from template - PVs with local storage - Launcher client dependencies - SPARK-33615: Launcher client archives - SPARK-33748: Launcher python client respecting PYSPARK_PYTHON - SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python - Launcher python client dependencies using a zip file - Test basic decommissioning - Test basic decommissioning with shuffle cleanup - Test decommissioning with dynamic allocation & shuffle cleanups - Test decommissioning timeouts Run completed in 18 minutes, 34 seconds. Total number of tests run: 26 Suites: completed 2, aborted 0 Tests: succeeded 26, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` Checked the compatibility matrix and the same k8s versions are supported as were by version 5.4.1. Closes #33233 from attilapiros/SPARK-36026. Authored-by: attilapiros <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
attilapiros
added a commit
to apache/spark
that referenced
this issue
Jul 13, 2021
…iness ### What changes were proposed in this pull request? Setting `kubernetes.request.retry.backoffLimit` by default to 3 when the user haven't specified any value for it. This way when k8s API servers gives back HTTP status code >= 500 then an exponential backoff will be triggered (where `kubernetes.request.retry.backoffInterval` is 1000ms by default). For details please check fabric8io/kubernetes-client#3087. ### Why are the changes needed? We experienced some internal K8s errors for example when the `etcdserver` leader election was ongoing the error was propagated to the API client and caused an issue in Spark: ``` Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/dex-app-bl24w4z9/pods/sparkpi-10-fcd3f6781a874212-driver. Message: etcdserver: leader changed. Received status: Status(apiVersion=v1, code=500, details=null, kind=Status, message=etcdserver: leader changed, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, status=Failure, additionalProperties={}). ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Running the integration tests along with `log4j.logger.org.apache.spark.deploy.k8s.SparkKubernetesClientFactory=DEBUG` the log4j config. It produced the following log: ``` 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: 21/07/08 11:01:14 DEBUG org.apache.spark.deploy.k8s.SparkKubernetesClientFactory: Kubernetes client config: { 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestConfig" : { 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "username" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "password" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "oauthToken" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "oauthTokenProvider" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateUsername" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateGroups" : [ null ], 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateExtras" : { }, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "watchReconnectInterval" : 1000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "watchReconnectLimit" : -1, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "connectionTimeout" : 10000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "uploadConnectionTimeout" : 10000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "uploadRequestTimeout" : 120000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffLimit" : 3, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffInterval" : 1000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestTimeout" : 10000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "rollingTimeout" : 900000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "scaleTimeout" : 600000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "loggingInterval" : 20000, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "websocketTimeout" : 5000, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "websocketPingInterval" : 0, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "maxConcurrentRequests" : 64, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "maxConcurrentRequestsPerHost" : 5, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateGroup" : null 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "contexts" : [ { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "talos-default", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "namespace" : "default", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "admintalos-default" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "admintalos-default" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "arn:aws:eks:us-west-2:392479084068:cluster/mow", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "arn:aws:eks:us-west-2:392479084068:cluster/mow" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "arn:aws:eks:us-west-2:392479084068:cluster/mow" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "minikube", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "extensions" : [ { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "context_info" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } ], 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "namespace" : "default", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "minikube" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "minikube" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "mow" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } ], 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "currentContext" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "minikube", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "extensions" : [ { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "context_info" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } ], 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "namespace" : "default", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "minikube" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "minikube" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "maxConcurrentRequests" : 64, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "maxConcurrentRequestsPerHost" : 5, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "autoConfigure" : false, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "trustCerts" : false, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "disableHostnameVerification" : false, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "masterUrl" : "https://192.168.64.127:8443/", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "apiVersion" : "v1", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "namespace" : "a0993113b8084cd3868b3052e698b17f", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "caCertFile" : "/Users/attilazsoltpiros/.minikube/ca.crt", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "clientCertFile" : "/Users/attilazsoltpiros/.minikube/profiles/minikube/client.crt", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "clientKeyFile" : "/Users/attilazsoltpiros/.minikube/profiles/minikube/client.key", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "clientKeyAlgo" : "RSA", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "clientKeyPassphrase" : "changeit", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "watchReconnectInterval" : 1000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "watchReconnectLimit" : -1, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "connectionTimeout" : 10000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "uploadConnectionTimeout" : 10000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "uploadRequestTimeout" : 120000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffLimit" : 3, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffInterval" : 1000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestTimeout" : 10000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "rollingTimeout" : 900000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "scaleTimeout" : 600000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "loggingInterval" : 20000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "websocketTimeout" : 5000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "websocketPingInterval" : 0, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateGroups" : [ null ], 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateExtras" : { }, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "http2Disable" : false, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "noProxy" : [ ], 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "tlsVersions" : [ "TLS_1_2" ], 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "errorMessages" : { 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "401" : "Unauthorized! Token may have expired! Please log-in again.", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "403" : "Forbidden! User minikube doesn't have permission." 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } ``` Which contains the expected values: ``` 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffLimit" : 3, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffInterval" : 1000, ``` Closes #33261 from attilapiros/SPARK-35334. Authored-by: attilapiros <[email protected]> Signed-off-by: attilapiros <[email protected]>
sunchao
pushed a commit
to sunchao/spark
that referenced
this issue
Dec 8, 2021
…iness ### What changes were proposed in this pull request? Setting `kubernetes.request.retry.backoffLimit` by default to 3 when the user haven't specified any value for it. This way when k8s API servers gives back HTTP status code >= 500 then an exponential backoff will be triggered (where `kubernetes.request.retry.backoffInterval` is 1000ms by default). For details please check fabric8io/kubernetes-client#3087. ### Why are the changes needed? We experienced some internal K8s errors for example when the `etcdserver` leader election was ongoing the error was propagated to the API client and caused an issue in Spark: ``` Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/dex-app-bl24w4z9/pods/sparkpi-10-fcd3f6781a874212-driver. Message: etcdserver: leader changed. Received status: Status(apiVersion=v1, code=500, details=null, kind=Status, message=etcdserver: leader changed, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, status=Failure, additionalProperties={}). ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Running the integration tests along with `log4j.logger.org.apache.spark.deploy.k8s.SparkKubernetesClientFactory=DEBUG` the log4j config. It produced the following log: ``` 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: 21/07/08 11:01:14 DEBUG org.apache.spark.deploy.k8s.SparkKubernetesClientFactory: Kubernetes client config: { 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestConfig" : { 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "username" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "password" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "oauthToken" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "oauthTokenProvider" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateUsername" : null, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateGroups" : [ null ], 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateExtras" : { }, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "watchReconnectInterval" : 1000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "watchReconnectLimit" : -1, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "connectionTimeout" : 10000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "uploadConnectionTimeout" : 10000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "uploadRequestTimeout" : 120000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffLimit" : 3, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffInterval" : 1000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestTimeout" : 10000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "rollingTimeout" : 900000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "scaleTimeout" : 600000, 21/07/08 11:01:14.873 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "loggingInterval" : 20000, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "websocketTimeout" : 5000, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "websocketPingInterval" : 0, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "maxConcurrentRequests" : 64, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "maxConcurrentRequestsPerHost" : 5, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateGroup" : null 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "contexts" : [ { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "talos-default", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "namespace" : "default", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "admintalos-default" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "admintalos-default" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "arn:aws:eks:us-west-2:392479084068:cluster/mow", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "arn:aws:eks:us-west-2:392479084068:cluster/mow" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "arn:aws:eks:us-west-2:392479084068:cluster/mow" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "minikube", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "extensions" : [ { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "context_info" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } ], 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "namespace" : "default", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "minikube" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "minikube" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "mow" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } ], 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "currentContext" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "context" : { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "cluster" : "minikube", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "extensions" : [ { 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "context_info" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } ], 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "namespace" : "default", 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "user" : "minikube" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "name" : "minikube" 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: }, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "maxConcurrentRequests" : 64, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "maxConcurrentRequestsPerHost" : 5, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "autoConfigure" : false, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "trustCerts" : false, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "disableHostnameVerification" : false, 21/07/08 11:01:14.874 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "masterUrl" : "https://192.168.64.127:8443/", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "apiVersion" : "v1", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "namespace" : "a0993113b8084cd3868b3052e698b17f", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "caCertFile" : "/Users/attilazsoltpiros/.minikube/ca.crt", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "clientCertFile" : "/Users/attilazsoltpiros/.minikube/profiles/minikube/client.crt", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "clientKeyFile" : "/Users/attilazsoltpiros/.minikube/profiles/minikube/client.key", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "clientKeyAlgo" : "RSA", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "clientKeyPassphrase" : "changeit", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "watchReconnectInterval" : 1000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "watchReconnectLimit" : -1, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "connectionTimeout" : 10000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "uploadConnectionTimeout" : 10000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "uploadRequestTimeout" : 120000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffLimit" : 3, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffInterval" : 1000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestTimeout" : 10000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "rollingTimeout" : 900000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "scaleTimeout" : 600000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "loggingInterval" : 20000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "websocketTimeout" : 5000, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "websocketPingInterval" : 0, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateGroups" : [ null ], 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "impersonateExtras" : { }, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "http2Disable" : false, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "noProxy" : [ ], 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "tlsVersions" : [ "TLS_1_2" ], 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "errorMessages" : { 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "401" : "Unauthorized! Token may have expired! Please log-in again.", 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "403" : "Forbidden! User minikube doesn't have permission." 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: } ``` Which contains the expected values: ``` 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffLimit" : 3, 21/07/08 11:01:14.875 ScalaTest-main-running-KubernetesSuite INFO ProcessUtils: "requestRetryBackoffInterval" : 1000, ``` Closes apache#33261 from attilapiros/SPARK-35334. Authored-by: attilapiros <[email protected]> Signed-off-by: attilapiros <[email protected]> (cherry picked from commit 03e48c8) Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
According to the API conventions when the HTTP status code is one of the followings:
Then the suggested client recovery behavior is "retry with exponential backoff".
(In case of 504 it is "Increase the value of the timeout param and retry with exponential backoff". But in the first PR I would focus on just on the retries).
These is valid problem I faced in a production system in Apache Spark but looked around and I found several some other cases where the root cause was the same.
In Apache Spark it was caused by etcd server leader election:
But other applications are suffering from similar issues this is why I think we can provide a solution here in the Kubernetes Client.
My solution would introduce 2 new configs:
Where "kubernetes.request.retry.backoffLimit" by default would be 0 to keep backward compatibility.
The text was updated successfully, but these errors were encountered: