-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REST client: hosts marked dead for the first time should not be immediately retried #29230
REST client: hosts marked dead for the first time should not be immediately retried #29230
Conversation
This was the plan from day one but due to a silly bug nodes were immediately retried after they were marked as dead for the first time. From the second time on, the expected backoff was applied.
Pinging @elastic/es-core-infra |
|
||
private static final long MIN_CONNECTION_TIMEOUT_NANOS = TimeUnit.MINUTES.toNanos(1); | ||
private static final long MAX_CONNECTION_TIMEOUT_NANOS = TimeUnit.MINUTES.toNanos(30); | ||
|
||
static final DeadHostState INITIAL_DEAD_STATE = new DeadHostState(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
private final int failedAttempts; | ||
private final long deadUntilNanos; | ||
private final TimeSupplier timeSupplier; | ||
|
||
DeadHostState() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is worth having a no-arg ctor at all. Maybe remove it entirely so the called has to think.
|
||
private DeadHostState() { | ||
DeadHostState(TimeSupplier timeSupplier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth javadocs on this to communicate that it is for building the initial dead state.
@@ -47,10 +51,19 @@ private DeadHostState() { | |||
* that failed many consecutive times). | |||
*/ | |||
DeadHostState(DeadHostState previousDeadHostState) { | |||
this(previousDeadHostState, TimeSupplier.DEFAULT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it isn't worth having the the TimeSupplier.DEFAULT
versions. The caller should use TimeSupplier.DEFAULT
.
long timeoutNanos = (long)Math.min(MIN_CONNECTION_TIMEOUT_NANOS * 2 * Math.pow(2, previousDeadHostState.failedAttempts * 0.5 - 1), | ||
MAX_CONNECTION_TIMEOUT_NANOS); | ||
this.deadUntilNanos = System.nanoTime() + timeoutNanos; | ||
this.deadUntilNanos = timeSupplier.getNanoTime() + timeoutNanos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still call it nanoTime
.
|
||
@Override | ||
public int compareTo(DeadHostState other) { | ||
return Long.compare(deadUntilNanos, other.deadUntilNanos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@Override | ||
public String toString() { | ||
return "DeadHostState{" + | ||
"failedAttempts=" + failedAttempts + | ||
", deadUntilNanos=" + deadUntilNanos + | ||
'}'; | ||
} | ||
|
||
static class TimeSupplier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this an interface? I think it'd feel cleaner that way.
And it is probably worth javadoc to explain that how you fetch time is pluggable for testing. Lots of folks are used to this but anyone that hasn't had to write unit tests for code involving timeouts and such won't be used to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do, I miss default methods here :)
|
||
public void testInitialDeadHostState() { | ||
DeadHostState deadHostState = new DeadHostState(); | ||
assertThat(deadHostState.getDeadUntilNanos(), greaterThan(System.nanoTime())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep away from using the default one for all the unit tests just in case of funny pauses and stuff. Maybe one, very small, very paranoid check that use nanoTime is col though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I'd move the System.nanoTime
call to before building the DeadHostState
. That way if there is a super long pause due to the CI machine swapping or something terrible we still won't mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted some smoke test that with the right time supplier we do the right thing. I think this way should be ok as we are just testing the difference between a recent object and a previous one.
} | ||
|
||
public void testDeadHostStateFromPrevious() { | ||
DeadHostState previous = new DeadHostState(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use a provider for the time then you can be quite explicit here. I think that is worth doing.
} | ||
} | ||
|
||
public void testDeadHostStateTimeouts() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels to me like this should be part of testDeadHostStateFromPrevious
. If you really want to have a version of the test that use System.nanoTime
I'd name it with DefaultTimeProvder
in the method name and name the method that uses the ConfigurableTimeSupplier
more like testDeadHostStateFromPrevious
.
@nik9000 I think I have addressed your comments, or at least most of them ;) |
…iately retried (#29230) This was the plan from day one but due to a silly bug nodes were immediately retried after they were marked as dead for the first time. From the second time on, the expected backoff was applied.
…iately retried (#29230) This was the plan from day one but due to a silly bug nodes were immediately retried after they were marked as dead for the first time. From the second time on, the expected backoff was applied.
* es/master: (22 commits) Fix building Javadoc JARs on JDK for client JARs (#29274) Require JDK 10 to build Elasticsearch (#29174) Decouple NamedXContentRegistry from ElasticsearchException (#29253) Docs: Update generating test coverage reports (#29255) [TEST] Fix issue with HttpInfo passed invalid parameter Remove all dependencies from XContentBuilder (#29225) Fix sporadic failure in CompositeValuesCollectorQueueTests Propagate ignore_unmapped to inner_hits (#29261) TEST: Increase timeout for testPrimaryReplicaResyncFailed REST client: hosts marked dead for the first time should not be immediately retried (#29230) TEST: Use different translog dir for a new engine Make SearchStats implement Writeable (#29258) [Docs] Spelling and grammar changes to reindex.asciidoc (#29232) Do not optimize append-only if seen normal op with higher seqno (#28787) [test] packaging: gradle tasks for groovy tests (#29046) Prune only gc deletes below local checkpoint (#28790) remove testUnassignedShardAndEmptyNodesInRoutingTable #28745: remove extra option in the composite rest tests Fold EngineDiskUtils into Store, for better lock semantics (#29156) Add file permissions checks to precommit task ...
* es/6.x: Fix building Javadoc JARs on JDK for client JARs (#29274) Require JDK 10 to build Elasticsearch (#29174) Decouple NamedXContentRegistry from ElasticsearchException (#29253) Docs: Update generating test coverage reports (#29255) [TEST] Fix issue with HttpInfo passed invalid parameter Remove all dependencies from XContentBuilder (#29225) Fix sporadic failure in CompositeValuesCollectorQueueTests Propagate ignore_unmapped to inner_hits (#29261) TEST: Increase timeout for testPrimaryReplicaResyncFailed REST client: hosts marked dead for the first time should not be immediately retried (#29230) Make SearchStats implement Writeable (#29258) [Docs] Spelling and grammar changes to reindex.asciidoc (#29232) [test] packaging: gradle tasks for groovy tests (#29046) remove testUnassignedShardAndEmptyNodesInRoutingTable Add file permissions checks to precommit task Remove execute mode bit from source files #28745: remove 7.x option in the composite rest tests. Optimize the composite aggregation for match_all and range queries (#28745) Clarify deprecation warning for auto_generate_phrase_query (#29204)
This was the plan from day one but due to a silly bug nodes were immediately retried after they were marked as dead for the first time. From the second time on, the expected back-off was applied.