-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed missed stopping of SchedulerEngine #39193
Conversation
The SchedulerEngine is used in several places in our code and not all of these usages properly stopped the SchedulerEngine, which could lead to test failures due to leaked threads from the SchedulerEngine. This change adds stopping to these usages in order to avoid the thread leaks that cause CI failures and noise. Closes elastic#38875
Pinging @elastic/es-distributed |
Pinging @elastic/es-analytics-geo |
Pinging @elastic/es-core-features |
Hit failure fixed by #39198 @elasticmachine run elasticsearch-ci/default-distro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the stuff outside of ILM, but I had some questions
around the ILM changes.
@@ -137,6 +138,12 @@ protected Clock getClock() { | |||
} | |||
indexLifecycleInitialisationService.set(new IndexLifecycleService(settings, client, clusterService, threadPool, | |||
getClock(), System::currentTimeMillis, xContentRegistry)); | |||
clusterService.addLifecycleListener(new LifecycleListener() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IndexLifecycle Plugin closes the scheduler when it itself is closed. Is this not done by the node
where it calls close() on all the plugins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does but I think we may want to stop before then. The node close mechanism calls stop()
first, which stops the cluster service but the IndexLifecycle job can still run before the plugin is actually closed. That said, I can remove this part of the change since I don't think it will actually affect the issue at hand.
private ClusterService clusterService; | ||
private LongSupplier nowSupplier; | ||
private SchedulerEngine.Job scheduledJob; | ||
private boolean closed = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the signal of whether the scheduler is shutdown should be
contained within the SchedulerEngine. is this not the same as checking engine.isShutdown()
:
--- a/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/scheduler/SchedulerEngine.java
+++ b/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/scheduler/SchedulerEngine.java
@@ -123,6 +123,11 @@ public class SchedulerEngine {
jobs.forEach(this::add);
}
+
+ public boolean isShutdown() {
+ return scheduler.isShutdown();
+ }
+
public void stop() {
scheduler.shutdownNow();
try {
@talevy thanks for reviewing. I simplified this so that there are not changes to persistent tasks but instead use closing of the plugin to terminate the SchedulerEngine. The change to IndexLifecycle is a bit ugly due to the lazy initialization. Essentially we don't want concurrent calls to the method that creates the SchedulerEngine and close otherwise we might actually miss closing the SchedulerEngine. |
...gin/ilm/src/test/java/org/elasticsearch/xpack/indexlifecycle/IndexLifecycleServiceTests.java
Outdated
Show resolved
Hide resolved
...k/plugin/ilm/src/main/java/org/elasticsearch/xpack/indexlifecycle/IndexLifecycleService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Jay!
I hope this fixes the leakage |
The SchedulerEngine is used in several places in our code and not all of these usages properly stopped the SchedulerEngine, which could lead to test failures due to leaked threads from the SchedulerEngine. This change adds stopping to these usages in order to avoid the thread leaks that cause CI failures and noise. Closes #38875
The SchedulerEngine is used in several places in our code and not all of these usages properly stopped the SchedulerEngine, which could lead to test failures due to leaked threads from the SchedulerEngine. This change adds stopping to these usages in order to avoid the thread leaks that cause CI failures and noise. Closes #38875
The SchedulerEngine is used in several places in our code and not all of these usages properly stopped the SchedulerEngine, which could lead to test failures due to leaked threads from the SchedulerEngine. This change adds stopping to these usages in order to avoid the thread leaks that cause CI failures and noise. Closes #38875
* elastic/master: Ensure index commit released when testing timeouts (elastic#39273) Avoid using TimeWarp in TransformIntegrationTests. (elastic#39277) Fixed missed stopping of SchedulerEngine (elastic#39193) [CI] Mute CcrRetentionLeaseIT.testRetentionLeaseIsRenewedDuringRecovery (elastic#39269) Muting AutoFollowIT.testAutoFollowManyIndices (elastic#39264) Clarify the use of sleep in CCR test Fix testCannotShrinkLeaderIndex (elastic#38529) Fix CCR tests that manipulate transport requests Align generated release notes with doc standards (elastic#39234) Mute test (elastic#39248) ReadOnlyEngine should update translog recovery state information (elastic#39238) Wrap accounting breaker check in assertBusy (elastic#39211) Simplify and Fix Synchronization in InternalTestCluster (elastic#39168) [Tests] Make testEngineGCDeletesSetting deterministic (elastic#38942) Extend nextDoc to delegate to the wrapped doc-value iterator for date_nanos (elastic#39176) Change ShardFollowTask to reuse common serialization logic (elastic#39094) Replace superfluous usage of Counter with Supplier (elastic#39048) Disable bwc tests for elastic#39094
The SchedulerEngine is used in several places in our code and not all of these usages properly stopped the SchedulerEngine, which could lead to test failures due to leaked threads from the SchedulerEngine. This change adds stopping to these usages in order to avoid the thread leaks that cause CI failures and noise. Closes elastic#38875
The SchedulerEngine is used in several places in our code and not all of these usages properly stopped the SchedulerEngine, which could lead to test failures due to leaked threads from the SchedulerEngine. This change adds stopping to these usages in order to avoid the thread leaks that cause CI failures and noise. Closes elastic#38875
The SchedulerEngine is used in several places in our code and not all
of these usages properly stopped the SchedulerEngine, which could lead
to test failures due to leaked threads from the SchedulerEngine. This
change adds stopping to these usages in order to avoid the thread leaks
that cause CI failures and noise.
Closes #38875