Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow tuning of prefetch #721

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

basil
Copy link
Member

@basil basil commented Jan 12, 2024

Context

Please read this document in full before reviewing this pull request, as otherwise it will be impossible to understand what I am trying to achieve here.

Problem

See JENKINS-72226. On high-latency connections, such as wide area networks, Remoting can take quite a while to initialize e.g. JGit due to a large number of class loading RPCs.

Solution

In the current design, the main mechanism we have to reduce latency is prefetch. An existing comment states:

// TODO could even traverse second-level dependencies, etc.

This PR makes prefetch level a tunable but does not change the default behavior of traversing one level of dependencies. It does allow the level to be set to 0 (effectively disabling prefetch) or > 1 (second-level or greater).

When the level is set > 1, more classes will be loaded on the server (taking up more memory) and more class metadata will be prefetched and sent to the client (also taking up more memory on the client). However, once it is sent, there is no need to make any further RPCs, so latency is improved.

To allow prefetch to be set to a high value without quickly consuming the entire dependency graph, we introduce a prune threshold. This prevents prefetch from continuing to search beyond the current file at the next level. The default prune threshold matches current behavior, allowing prefetch to traverse new files at level 1. The default prune threshold allows the prefetch level to be set very high (100 or 500) to effectively preload as many classes as possible from a single JAR. This is not by accident, as the Remoting JAR caching feature also sends JAR data one JAR at a time. So the idea is to send the JAR file and prefetch all classes inside of it to minimize RPCs.

Since this will consume more memory which could negatively impact the normal use case, I am not changing the defaults. But in practice with simple JGit testing I have never seen more than a dozen or so MB of increase in memory usage, which is really quite small considering the latency improvement. Of course, this could be a lot worse in production use cases with lots of big JAR files.

The main benefit of this PR is that it allows for experimentation. Users with latency problems can tune the prefetch depth to a high value like 500 and see if the decreased latency is worth the increased heap usage. If it is, they can keep the setting, and if it isn't they can drop it. In my simple testing, I saw quite a decent benefit at a very reasonable cost.

As I mention in the other document, the only way to solve this problem in an elegant fashion in the default configuration is to perform a drastic rewrite of this whole subsystem, which I have not attempted to do in this PR.

Implementation

Refactored the existing code into a recursive method.

Testing done

All testing was done with a server with the changes from this PR and a modified client. The client was modified as follows:

diff --git a/src/main/java/hudson/remoting/Channel.java b/src/main/java/hudson/remoting/Channel.java
index 702006f4..a2aafefb 100644
--- a/src/main/java/hudson/remoting/Channel.java
+++ b/src/main/java/hudson/remoting/Channel.java
@@ -52,11 +52,13 @@ import java.lang.ref.WeakReference;
 import java.net.URL;
 import java.nio.channels.ClosedChannelException;
 import java.nio.charset.StandardCharsets;
+import java.security.SecureRandom;
 import java.util.Collections;
 import java.util.Date;
 import java.util.List;
 import java.util.Locale;
 import java.util.Map;
+import java.util.Random;
 import java.util.WeakHashMap;
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.CopyOnWriteArrayList;
@@ -122,6 +124,7 @@ import java.util.stream.Stream;
  * @author Kohsuke Kawaguchi
  */
 public class Channel implements VirtualChannel, IChannel, Closeable {
+    private static final Random RANDOM = new SecureRandom();
     private final CommandTransport transport;
 
@@ -765,6 +768,18 @@ public class Channel implements VirtualChannel, IChannel, Closeable {
         if(logger.isLoggable(Level.FINE))
             logger.fine("Send "+cmd);
 
+        if (cmd instanceof RemoteInvocationHandler.RPCRequest) {
+            RemoteInvocationHandler.RPCRequest rpcCmd = (RemoteInvocationHandler.RPCRequest) cmd;
+            if ("hudson.remoting.RemoteClassLoader$IClassLoader".equals(rpcCmd.declaringClassName)) {
+                try {
+                    Thread.sleep(RANDOM.nextInt(250));
+                } catch (InterruptedException e) {
+                    Thread.currentThread().interrupt();
+                }
+                logger.log(Level.WARNING, "Executing IClassLoader RPC: " + rpcCmd.toString());
+            }
+        }
+
         transport.write(cmd, cmd instanceof CloseCommand);
         commandsSent.incrementAndGet();
         lastCommandSentAt.set(System.currentTimeMillis());
diff --git a/src/main/java/hudson/remoting/RemoteClassLoader.java b/src/main/java/hudson/remoting/RemoteClassLoader.java
index 84518a82..b21cf9ec 100644
--- a/src/main/java/hudson/remoting/RemoteClassLoader.java
+++ b/src/main/java/hudson/remoting/RemoteClassLoader.java
@@ -353,6 +353,7 @@ final class RemoteClassLoader extends URLClassLoader {
                     invokeClassReferenceLoadTestingHookIfNeeded();
 
                     Map<String, ClassFile2> all = proxy.fetch3(name);
+                    LOGGER.log(WARNING, "Client has prefetched " + all.size() + " classes");
                     synchronized (prefetchedClasses) {
diff --git a/src/main/java/hudson/remoting/RemoteInvocationHandler.java b/src/main/java/hudson/remoting/RemoteInvocationHandler.java
index 31ba0055..2893ed47 100644
--- a/src/main/java/hudson/remoting/RemoteInvocationHandler.java
+++ b/src/main/java/hudson/remoting/RemoteInvocationHandler.java
@@ -862,7 +862,7 @@ final class RemoteInvocationHandler implements InvocationHandler, Serializable {
          * Null if deserialized historically.
          */
         @CheckForNull
-        private final String declaringClassName;
+        public final String declaringClassName;
 
         protected final String methodName;
         /**

The purpose of the sleep statement is to simulate the effects of a slow network on Remoting class loading. The other two changes are log statements in order to measure number of class loading RPCs and number of preloaded classes. The JAR cache was cleared between each test.

Server configuration            Job duration (sec)     # Classes prefetched      # Class loading RPCs
Before this PR                  58                     1682                      518
Prefetch depth 0                125                    1024                      1052
Prefetch depth 1                62                     1682                      520
Prefetch depth 2                59                     2016                      437
Prefetch depth 3                51                     2220                      374
Prefetch depth 5                46                     2822                      350
Prefetch depth 10               37                     3349                      292
Prefetch depth 50               38                     4175                      344
Prefetch depth 100              32                     4605                      268
Prefetch depth 500              31                     4655                      266     

This spreadsheet shows the results with a logarithmic trend line.

The results are as expected. When prefetch is disabled, the job runs more slowly and with more RPCs but with lower memory usage. When prefetch is set to the default value, the results are as before this PR. The slopes of the logarithmic trend lines are negative for prefetch depth and # of RPCs as expected. The slope is positive for # of classes prefetched as expected.

The job doubled in duration when we disabled prefetch completely, and it halved in duration by the time we got to a prefetch level of 500, while # of classes loaded only grew by 4x. This might seem like a lot, but note that the difference between prefetch 1 (the default) and 500 is only 12 MB worth of increased heap usage. This is really a small cost to pay for the drastic improvement in latency. However, it can be hard to predict the increase in heap usage in the general case, particularly if plugins are installed that happen to contain huge JAR files. So to avoid regression, we leave the existing behavior as default and simply recommend that anyone who is having latency problems set this to a high value (e.g., 500).

@basil basil added the enhancement For changelog: An enhancement providing new capability. label Jan 12, 2024
@jglick
Copy link
Member

jglick commented Oct 24, 2024

(merge conflict)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement For changelog: An enhancement providing new capability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants