Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry HTTP operation in case IOException too (exponential backoff) #3293

Merged
merged 1 commit into from
Jul 2, 2021

Conversation

attilapiros
Copy link
Contributor

Description

Fixing #3291: Retrying the HTTP operation in case of IOException too.

Tested with unit tests.

With the following logger config:

$ cat kubernetes-client/src/test/resources/simplelogger.properties
org.slf4j.simpleLogger.defaultLogLevel=info
org.slf4j.simpleLogger.log.io.fabric8.kubernetes.client.dsl.base = debug

the unit tests produced the following example output (partial output):

[main] DEBUG io.fabric8.kubernetes.client.dsl.base.OperationSupport - HTTP operation on url: https://172.17.0.2:8443/api/v1/pods/test-pod should be retried as the response code was 500, retrying after 2000 millis
[main] DEBUG io.fabric8.kubernetes.client.dsl.base.OperationSupport - HTTP operation on url: https://172.17.0.2:8443/api/v1/pods/test-pod should be retried after 4000 millis because of IOException
java.io.IOException: For example java.net.ConnectException
        at io.fabric8.kubernetes.client.dsl.base.BaseOperationTest.lambda$newHttpClientWithSomeFailures$0(BaseOperationTest.java:245)
        at org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:42)
        at org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:103)
        at org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Checklist

  • Code contributed by me aligns with current project license: Apache 2.0
  • I Added CHANGELOG entry regarding this change
  • I have implemented unit tests to cover my changes
  • I have added/updated the javadocs and other documentation accordingly
  • No new bugs, code smells, etc. in SonarCloud report
  • I tested my code in Kubernetes
  • I tested my code in OpenShift

@centos-ci
Copy link

Can one of the admins verify this patch?

@sonarcloud
Copy link

sonarcloud bot commented Jul 2, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@rohanKanojia
Copy link
Member

@geoand : Would appreciate it if you could review this one since you suggested it.

@rohanKanojia rohanKanojia requested a review from manusa July 2, 2021 09:51
Copy link
Contributor

@geoand geoand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shawkins
Copy link
Contributor

shawkins commented Jul 2, 2021

Could the cases be combined?

    do {
      IOException ex = null;
      try {
        response = client.newCall(request).execute();
      } catch (IOException e) {
        ex = e;
      }
      doRetry = numRetries < requestRetryBackoffLimit && (response == null || response.code() >= 500);
      if (doRetry) {
        ...
      } else if (ex != null) {
        throw ex;
      }
    } while(doRetry);

Copy link
Member

@manusa manusa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx!

@manusa manusa linked an issue Jul 2, 2021 that may be closed by this pull request
Copy link
Contributor

@shawkins shawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Just added the comment/nit about combining the retry cases.

@manusa
Copy link
Member

manusa commented Jul 2, 2021

Just added the comment/nit about combining the retry cases.

We can address those separately

@manusa manusa added this to the 5.6.0 milestone Jul 2, 2021
@manusa manusa merged commit 6bb55ee into fabric8io:master Jul 2, 2021
@attilapiros
Copy link
Contributor Author

Thanks for everybody!

dongjoon-hyun pushed a commit to apache/spark that referenced this pull request Jul 30, 2021
### What changes were proposed in this pull request?

Upgrade Kubernetes Client Version to 5.6.0.

### Why are the changes needed?

The exponential backoff feature is extended with one more case:
[ Retry HTTP operation in case IOException too (exponential backoff)](fabric8io/kubernetes-client#3293)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested with existing unit and integration tests:

```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- Launcher client dependencies
- SPARK-33615: Launcher client archives
- SPARK-33748: Launcher python client respecting PYSPARK_PYTHON
- SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python
- Launcher python client dependencies using a zip file
- Test basic decommissioning
- Test basic decommissioning with shuffle cleanup
- Test decommissioning with dynamic allocation & shuffle cleanups
- Test decommissioning timeouts
...
[INFO] BUILD SUCCESS
```

Closes #33593 from attilapiros/SPARK-36358.

Authored-by: attilapiros <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry setting should be applied when an IOException is thrown
7 participants