-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-72325] Define an executor and scheduler for SandboxResolvingClassLoader
#543
Conversation
4683a5a
to
c275367
Compare
SandboxResolvingClassLoader
SandboxResolvingClassLoader
SandboxResolvingClassLoader
SandboxResolvingClassLoader
c275367
to
648a16b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* <p>In the medium term, once {@link Thread#inheritedAccessControlContext} is removed upstream, we could possibly | ||
* switch to a combination of {@link Executors#newCachedThreadPool} and {@link ClassLoaderSanityThreadFactory}. | ||
* | ||
* <p>In the long term, a listener should be added to inform this class when dynamically installed plugins become | ||
* available, as described in the comments to {@link #makeParentCache(boolean)}, in which case the use of Caffeine | ||
* could possibly be removed entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the listener approach could probably be implemented by us at any time, whereas the removal of Thread.inheritedAccessControlContext
may take years I guess.
Environment
Apache Maven 3.9.5 (57804ffe001d7215b5e7bcb531cf83df38f93546)
Maven home: /usr/share/apache-maven-3.9.5
Java version: 21.0.1, vendor: Eclipse Adoptium, runtime: /opt/jdk-21
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.10.197-186.748.amzn2.x86_64", arch: "amd64", family: "unix"
Steps to reproduce
See JENKINS-72325. Run a
jenkinsci/bom
job with the following patch:This patch runs
org.jenkinsci.plugins.workflow.libs.LibraryMemoryTest
in a loop against the latest Jenkins weekly release on Java 21.Expected results
The test runs in a loop without failure. This is the actual result when running the test on Java 17.
Actual results
The test eventually fails after 1-25 iterations with
From standard error:
The failure cannot be easily reproduced locally; however, it failed in CI (with the above patch) after about 1-25 iterations against commit 2d062012 (Nov 13), commit 7ae60b1 (Nov 5), commit 430707e (October 29), commit c300386 (October 16), commit 155e2e7 (October 9), and commit 2798690 (October 2). So this is not a very recent flake.
Evaluation
Care must be taken to avoid leaking instances of
GroovyClassLoader
when computing cached values inSandboxResolvingClassLoader
. In its default configuration, Caffeine usesForkJoinPool#commonPool
as itsExecutor
. As of recent Java versions,ForkJoinPool
can capture a reference toGroovyClassLoader
by creating aForkJoinWorkerThread
whoseThread#inheritedAccessControlContext
refers to anAccessControlContext
whoseProtectionDomain
refers toGroovyClassLoader
.This is arguably a Java Platform regression, likely caused by JEP-425. However, as of JEP-411,
Thread#inheritedAccessControlContext
is deprecated for removal. Therefore, we explicitly decline to report this issue upstream, although we must contend with it in the short to medium term in Jenkins.Rejected Solution
We initially tried configuring Caffeine with the
Executor
returned byExecutors#newCachedThreadPool
. However, this merely traded one set of problems for another. ThisExecutor
was seen to also capture a reference toGroovyClassLoader
by creating aThread
whoseThread#inheritedAccessControlContext
refers to anAccessControlContext
whoseProtectionDomain
refers toGroovyClassLoader
:Additionally, when the thread pool's
ThreadFactory
is not wrapped byClassLoaderSanityThreadFactory
(see JENKINS-49206), theExecutor
can sometimes createThread
instances whoseThread#contextClassLoader
refers toGroovyClassLoader
:Given the problems with this approach, we rejected a solution of
Executors#newCachedThreadPool
.Solution
We create a dedicated
Executors#newSingleThreadExecutor
, which is safe for use with Caffeine from a memory perspective because:ForkJoinPool#commonPool
, the thread is eagerly created and avoids references toGroovyClassLoader
inThread#inheritedAccessControlContext
.Executors#newCachedThreadPool
, the thread is eagerly created and avoids references toGroovyClassLoader
inThread#inheritedAccessControlContext
.Executors#newCachedThreadPool
, the thread is eagerly created and avoids references toGroovyClassLoader
inThread#contextClassLoader
, thereby avoiding the need forClassLoaderSanityThreadFactory
.A single-threaded
Executor
is safe for use with Caffeine from a CPU perspective because the cache's work is implemented with cheap O(1) algorithms.Future work
In the medium term, once
Thread#inheritedAccessControlContext
is removed upstream, we could possibly switch to a combination ofExecutors#newCachedThreadPool
andClassLoaderSanityThreadFactory
.The comments state:
If this were done, then this two-level Caffeine cache could possibly be replaced by something based on ClassValue. See, for example, org.kohsuke.stapler.ClassLoaderValue. We explicitly decline to perform this cleanup, but we have updated the existing wish list comment to add this cleanup to the list.
Implementation notes
See the code comments.
Testing done
We tested by running the test in a loop in CI (the only environment where the problem can be reliably reproduced in under 50 iterations) in four configurations:
In configuration 3, the test always failed within 50 iterations with the error shown above. In configuration 1, the test passed reliably after two hours of iterations. After this PR, configurations 2 and 4 are passing reliably after two hours of iterations on both Java 17 and Java 21.
In addition to the above tests, I also ran a successful
jenkinsci/bom
test with theweekly-test
label to test PCT for all plugins against the weekly release line.