Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry network operations with registries on I/O errors #3351

Merged

Conversation

rmannibucau
Copy link
Contributor

Follow up of #1409

It enables http client retry when the blob content is re-readable.

Copy link
Contributor

@meltsufin meltsufin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmannibucau Thanks for the PR, but it looks like some unit tests are broken. Do they pass for you locally?

Copy link
Member

@chanseokoh chanseokoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

@@ -312,6 +315,19 @@ private Response call(String httpMethod, URL url, Request request, HttpTransport
httpTransport
.createRequestFactory()
.buildRequest(httpMethod, new GenericUrl(url), request.getHttpContent())
.setIOExceptionHandler(new HttpBackOffIOExceptionHandler(new ExponentialBackOff()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the maximum number of retries (and the maximum time) until this eventually gives up? We don't want for Jib to run for too long when it's not recoverable (e.g., server is down for a day).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In sec (quoted from the javadoc of the exponential backoff)

   request#     retry_interval     randomized_interval
  
   1             0.5                [0.25,   0.75]
   2             0.75               [0.375,  1.125]
   3             1.125              [0.562,  1.687]
   4             1.687              [0.8435, 2.53]
   5             2.53               [1.265,  3.795]
   6             3.795              [1.897,  5.692]
   7             5.692              [2.846,  8.538]
   8             8.538              [4.269, 12.807]
   9            12.807              [6.403, 19.210]
   10           19.210              BackOff.STOP

Guess another PR can change the retry logic to not use exp backoff but these defaults should tolerate the small hips I saw and just require configuration propagation work so can be done after once the strategies are discussed/decided.

return;
}
// make the test fail
ex.sendResponseHeaders(123, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works as you intended. 123 actually causes IOException. OTOH, 400 works.

There are some minor style errors, and given above, I'd have the code to look like this:

import static com.google.common.truth.Truth.assertThat;

public void testRetries() throws IOException {
    HttpServer server = HttpServer.create(new InetSocketAddress(0), 1);
    AtomicBoolean failed = new AtomicBoolean();
    server
        .createContext("/")
        .setHandler(
            exchange ->
                exchange.sendResponseHeaders(failed.compareAndSet(false, true) ? 123 : 200, 0));
    try {
      server.start();
      int port = server.getAddress().getPort();
      List<LogEvent> events = new ArrayList<>();
      int returnCode =
          new FailoverHttpClient(true, true, events::add)
              .get(new URL("http://localhost:" + port), Request.builder().build())
              .getStatusCode();
      assertThat(returnCode).isEqualTo(200);
      assertThat(failed.get()).isTrue();
      assertThat(events)
          .containsExactly(
              LogEvent.warn("GET http://localhost:" + port + " failed and will be retried"));
    } finally {
      server.stop(0);
    }
}

Copy link
Member

@chanseokoh chanseokoh Aug 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take care of this myself in a follow-up PR (#3390).

@chanseokoh
Copy link
Member

Also, there are test failures, so please take a look at them too.

@rmannibucau
Copy link
Contributor Author

Just a quick message to say I am AFK and will look when back - until someone beats me at it ;).
Only point I am not sure is writable blob which should then be split to work (Blobs.of(content, retryable)? or just ensure network call is done lazily) otherwise it will not reach the goal.

@chanseokoh
Copy link
Member

Thanks for the update. Feel free to take your time.

Only point I am not sure is writable blob which should then be split to work (Blobs.of(content, retryable)?

I think adding another Blobs.from() is a reasonable option?

  public static Blob from(WritableContents writable, boolean retryable) {
    return new WritableContentsBlob(writable, retryable);
  }

@chanseokoh
Copy link
Member

Oops, I tried to resolve the merge conflict but it broke the code.
f34b525#diff-5b8da1dfe36a07daee79c63d41fdbecb8c864182144f2ca7449e90b6ed2b7914R66

It should be

-entry, entry.isFile() ? Blobs.from(entry.getPath()) : Blobs.from(ignored -> {}), true);
+entry, entry.isFile() ? Blobs.from(entry.getPath()) : Blobs.from(ignored -> {}, true));

@chanseokoh
Copy link
Member

I think this LGTM. Thanks for the contribution! Will take another look in the afternoon. There are a few things that I want to clean up, but I can take care of this myself later.

@chanseokoh chanseokoh added this to the common next release milestone Aug 5, 2021
@sonarcloud
Copy link

sonarcloud bot commented Aug 5, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

83.3% 83.3% Coverage
0.0% 0.0% Duplication

Copy link
Member

@chanseokoh chanseokoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I think this will significantly reduce user friction. Something we always wanted to do. Thanks again!

@chanseokoh chanseokoh merged commit 723ba94 into GoogleContainerTools:master Aug 5, 2021
@chanseokoh
Copy link
Member

Follow-up: #3390

@chanseokoh
Copy link
Member

@rmannibucau you can test the the retry feature with 3.1.4. I had to tweak it a bit due to #3422.

Thanks again for you contribution!

@chanseokoh chanseokoh changed the title basic retry for http client Retry network operations with registries on I/O errors Aug 22, 2021
@mpeddada1 mpeddada1 mentioned this pull request Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants