Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST client: hosts marked dead for the first time should not be immediately retried #29230

Merged
merged 4 commits into from
Mar 27, 2018

Conversation

javanna
Copy link
Member

@javanna javanna commented Mar 23, 2018

This was the plan from day one but due to a silly bug nodes were immediately retried after they were marked as dead for the first time. From the second time on, the expected back-off was applied.

This was the plan from day one but due to a silly bug nodes were immediately retried after they were marked as dead for the first time. From the second time on, the expected backoff was applied.
@javanna javanna added >bug :Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch v7.0.0 v6.3.0 v6.2.4 labels Mar 23, 2018
@javanna javanna requested a review from nik9000 March 23, 2018 21:00
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra


private static final long MIN_CONNECTION_TIMEOUT_NANOS = TimeUnit.MINUTES.toNanos(1);
private static final long MAX_CONNECTION_TIMEOUT_NANOS = TimeUnit.MINUTES.toNanos(30);

static final DeadHostState INITIAL_DEAD_STATE = new DeadHostState();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

private final int failedAttempts;
private final long deadUntilNanos;
private final TimeSupplier timeSupplier;

DeadHostState() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is worth having a no-arg ctor at all. Maybe remove it entirely so the called has to think.


private DeadHostState() {
DeadHostState(TimeSupplier timeSupplier) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth javadocs on this to communicate that it is for building the initial dead state.

@@ -47,10 +51,19 @@ private DeadHostState() {
* that failed many consecutive times).
*/
DeadHostState(DeadHostState previousDeadHostState) {
this(previousDeadHostState, TimeSupplier.DEFAULT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it isn't worth having the the TimeSupplier.DEFAULT versions. The caller should use TimeSupplier.DEFAULT.

long timeoutNanos = (long)Math.min(MIN_CONNECTION_TIMEOUT_NANOS * 2 * Math.pow(2, previousDeadHostState.failedAttempts * 0.5 - 1),
MAX_CONNECTION_TIMEOUT_NANOS);
this.deadUntilNanos = System.nanoTime() + timeoutNanos;
this.deadUntilNanos = timeSupplier.getNanoTime() + timeoutNanos;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still call it nanoTime.


@Override
public int compareTo(DeadHostState other) {
return Long.compare(deadUntilNanos, other.deadUntilNanos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@Override
public String toString() {
return "DeadHostState{" +
"failedAttempts=" + failedAttempts +
", deadUntilNanos=" + deadUntilNanos +
'}';
}

static class TimeSupplier {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this an interface? I think it'd feel cleaner that way.

And it is probably worth javadoc to explain that how you fetch time is pluggable for testing. Lots of folks are used to this but anyone that hasn't had to write unit tests for code involving timeouts and such won't be used to it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do, I miss default methods here :)


public void testInitialDeadHostState() {
DeadHostState deadHostState = new DeadHostState();
assertThat(deadHostState.getDeadUntilNanos(), greaterThan(System.nanoTime()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep away from using the default one for all the unit tests just in case of funny pauses and stuff. Maybe one, very small, very paranoid check that use nanoTime is col though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I'd move the System.nanoTime call to before building the DeadHostState. That way if there is a super long pause due to the CI machine swapping or something terrible we still won't mind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted some smoke test that with the right time supplier we do the right thing. I think this way should be ok as we are just testing the difference between a recent object and a previous one.

}

public void testDeadHostStateFromPrevious() {
DeadHostState previous = new DeadHostState();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use a provider for the time then you can be quite explicit here. I think that is worth doing.

}
}

public void testDeadHostStateTimeouts() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels to me like this should be part of testDeadHostStateFromPrevious. If you really want to have a version of the test that use System.nanoTime I'd name it with DefaultTimeProvder in the method name and name the method that uses the ConfigurableTimeSupplier more like testDeadHostStateFromPrevious.

@javanna
Copy link
Member Author

javanna commented Mar 26, 2018

@nik9000 I think I have addressed your comments, or at least most of them ;)

@javanna javanna merged commit 13f9e92 into elastic:master Mar 27, 2018
javanna added a commit that referenced this pull request Mar 27, 2018
…iately retried (#29230)

This was the plan from day one but due to a silly bug nodes were immediately retried after they were marked as dead for the first time. From the second time on, the expected backoff was applied.
javanna added a commit that referenced this pull request Mar 27, 2018
…iately retried (#29230)

This was the plan from day one but due to a silly bug nodes were immediately retried after they were marked as dead for the first time. From the second time on, the expected backoff was applied.
martijnvg added a commit that referenced this pull request Mar 28, 2018
* es/master: (22 commits)
  Fix building Javadoc JARs on JDK for client JARs (#29274)
  Require JDK 10 to build Elasticsearch (#29174)
  Decouple NamedXContentRegistry from ElasticsearchException (#29253)
  Docs: Update generating test coverage reports (#29255)
  [TEST] Fix issue with HttpInfo passed invalid parameter
  Remove all dependencies from XContentBuilder (#29225)
  Fix sporadic failure in CompositeValuesCollectorQueueTests
  Propagate ignore_unmapped to inner_hits (#29261)
  TEST: Increase timeout for testPrimaryReplicaResyncFailed
  REST client: hosts marked dead for the first time should not be immediately retried (#29230)
  TEST: Use different translog dir for a new engine
  Make SearchStats implement Writeable (#29258)
  [Docs] Spelling and grammar changes to reindex.asciidoc (#29232)
  Do not optimize append-only if seen normal op with higher seqno (#28787)
  [test] packaging: gradle tasks for groovy tests (#29046)
  Prune only gc deletes below local checkpoint (#28790)
  remove testUnassignedShardAndEmptyNodesInRoutingTable
  #28745: remove extra option in the composite rest tests
  Fold EngineDiskUtils into Store, for better lock semantics (#29156)
  Add file permissions checks to precommit task
  ...
martijnvg added a commit that referenced this pull request Mar 28, 2018
* es/6.x:
  Fix building Javadoc JARs on JDK for client JARs (#29274)
  Require JDK 10 to build Elasticsearch (#29174)
  Decouple NamedXContentRegistry from ElasticsearchException (#29253)
  Docs: Update generating test coverage reports (#29255)
  [TEST] Fix issue with HttpInfo passed invalid parameter
  Remove all dependencies from XContentBuilder (#29225)
  Fix sporadic failure in CompositeValuesCollectorQueueTests
  Propagate ignore_unmapped to inner_hits (#29261)
  TEST: Increase timeout for testPrimaryReplicaResyncFailed
  REST client: hosts marked dead for the first time should not be immediately retried (#29230)
  Make SearchStats implement Writeable (#29258)
  [Docs] Spelling and grammar changes to reindex.asciidoc (#29232)
  [test] packaging: gradle tasks for groovy tests (#29046)
  remove testUnassignedShardAndEmptyNodesInRoutingTable
  Add file permissions checks to precommit task
  Remove execute mode bit from source files
  #28745: remove 7.x option in the composite rest tests.
  Optimize the composite aggregation for match_all and range queries (#28745)
  Clarify deprecation warning for auto_generate_phrase_query (#29204)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants