Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed missed stopping of SchedulerEngine #39193

Merged
merged 7 commits into from
Feb 21, 2019

Conversation

jaymode
Copy link
Member

@jaymode jaymode commented Feb 20, 2019

The SchedulerEngine is used in several places in our code and not all
of these usages properly stopped the SchedulerEngine, which could lead
to test failures due to leaked threads from the SchedulerEngine. This
change adds stopping to these usages in order to avoid the thread leaks
that cause CI failures and noise.

Closes #38875

The SchedulerEngine is used in several places in our code and not all
of these usages properly stopped the SchedulerEngine, which could lead
to test failures due to leaked threads from the SchedulerEngine. This
change adds stopping to these usages in order to avoid the thread leaks
that cause CI failures and noise.

Closes elastic#38875
@jaymode jaymode added >non-issue :Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. v7.0.0 :Data Management/ILM+SLM Index and Snapshot lifecycle management :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v6.7.0 v8.0.0 v7.2.0 labels Feb 20, 2019
@jaymode jaymode requested a review from talevy February 20, 2019 17:41
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@jaymode
Copy link
Member Author

jaymode commented Feb 20, 2019

Hit failure fixed by #39198

@elasticmachine run elasticsearch-ci/default-distro

Copy link
Contributor

@talevy talevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the stuff outside of ILM, but I had some questions
around the ILM changes.

@@ -137,6 +138,12 @@ protected Clock getClock() {
}
indexLifecycleInitialisationService.set(new IndexLifecycleService(settings, client, clusterService, threadPool,
getClock(), System::currentTimeMillis, xContentRegistry));
clusterService.addLifecycleListener(new LifecycleListener() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexLifecycle Plugin closes the scheduler when it itself is closed. Is this not done by the node
where it calls close() on all the plugins?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does but I think we may want to stop before then. The node close mechanism calls stop() first, which stops the cluster service but the IndexLifecycle job can still run before the plugin is actually closed. That said, I can remove this part of the change since I don't think it will actually affect the issue at hand.

private ClusterService clusterService;
private LongSupplier nowSupplier;
private SchedulerEngine.Job scheduledJob;
private boolean closed = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the signal of whether the scheduler is shutdown should be
contained within the SchedulerEngine. is this not the same as checking engine.isShutdown():

--- a/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/scheduler/SchedulerEngine.java
+++ b/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/scheduler/SchedulerEngine.java
@@ -123,6 +123,11 @@ public class SchedulerEngine {
         jobs.forEach(this::add);
     }
 
+
+    public boolean isShutdown() {
+        return scheduler.isShutdown();
+    }
+
     public void stop() {
         scheduler.shutdownNow();
         try {

@jaymode
Copy link
Member Author

jaymode commented Feb 21, 2019

@talevy thanks for reviewing. I simplified this so that there are not changes to persistent tasks but instead use closing of the plugin to terminate the SchedulerEngine. The change to IndexLifecycle is a bit ugly due to the lazy initialization. Essentially we don't want concurrent calls to the method that creates the SchedulerEngine and close otherwise we might actually miss closing the SchedulerEngine.

Copy link
Contributor

@talevy talevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Jay!

@talevy
Copy link
Contributor

talevy commented Feb 21, 2019

I hope this fixes the leakage

@jaymode jaymode merged commit 7db7cda into elastic:master Feb 21, 2019
@jaymode jaymode deleted the stop_scheduler_engine branch February 21, 2019 21:24
jaymode added a commit that referenced this pull request Feb 21, 2019
The SchedulerEngine is used in several places in our code and not all
of these usages properly stopped the SchedulerEngine, which could lead
to test failures due to leaked threads from the SchedulerEngine. This
change adds stopping to these usages in order to avoid the thread leaks
that cause CI failures and noise.

Closes #38875
jaymode added a commit that referenced this pull request Feb 21, 2019
The SchedulerEngine is used in several places in our code and not all
of these usages properly stopped the SchedulerEngine, which could lead
to test failures due to leaked threads from the SchedulerEngine. This
change adds stopping to these usages in order to avoid the thread leaks
that cause CI failures and noise.

Closes #38875
jaymode added a commit that referenced this pull request Feb 21, 2019
The SchedulerEngine is used in several places in our code and not all
of these usages properly stopped the SchedulerEngine, which could lead
to test failures due to leaked threads from the SchedulerEngine. This
change adds stopping to these usages in order to avoid the thread leaks
that cause CI failures and noise.

Closes #38875
jasontedor added a commit to DaveCTurner/elasticsearch that referenced this pull request Feb 22, 2019
* elastic/master:
  Ensure index commit released when testing timeouts (elastic#39273)
  Avoid using TimeWarp in TransformIntegrationTests. (elastic#39277)
  Fixed missed stopping of SchedulerEngine (elastic#39193)
  [CI] Mute CcrRetentionLeaseIT.testRetentionLeaseIsRenewedDuringRecovery (elastic#39269)
  Muting AutoFollowIT.testAutoFollowManyIndices (elastic#39264)
  Clarify the use of sleep in CCR test
  Fix testCannotShrinkLeaderIndex (elastic#38529)
  Fix CCR tests that manipulate transport requests
  Align generated release notes with doc standards (elastic#39234)
  Mute test (elastic#39248)
  ReadOnlyEngine should update translog recovery state information (elastic#39238)
  Wrap accounting breaker check in assertBusy (elastic#39211)
  Simplify and Fix Synchronization in InternalTestCluster (elastic#39168)
  [Tests] Make testEngineGCDeletesSetting deterministic (elastic#38942)
  Extend nextDoc to delegate to the wrapped doc-value iterator for date_nanos (elastic#39176)
  Change ShardFollowTask to reuse common serialization logic (elastic#39094)
  Replace superfluous usage of Counter with Supplier (elastic#39048)
  Disable bwc tests for elastic#39094
weizijun pushed a commit to weizijun/elasticsearch that referenced this pull request Feb 22, 2019
The SchedulerEngine is used in several places in our code and not all
of these usages properly stopped the SchedulerEngine, which could lead
to test failures due to leaked threads from the SchedulerEngine. This
change adds stopping to these usages in order to avoid the thread leaks
that cause CI failures and noise.

Closes elastic#38875
weizijun pushed a commit to weizijun/elasticsearch that referenced this pull request Feb 22, 2019
The SchedulerEngine is used in several places in our code and not all
of these usages properly stopped the SchedulerEngine, which could lead
to test failures due to leaked threads from the SchedulerEngine. This
change adds stopping to these usages in order to avoid the thread leaks
that cause CI failures and noise.

Closes elastic#38875
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management :Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. >non-issue :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v6.7.0 v7.0.0-rc1 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ArrayCompareConditionSearchTests thread leak suite failure
5 participants