Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: enable TCP_NODELAY by default on all platforms #906

Closed
silverwind opened this issue Feb 20, 2015 · 42 comments
Closed

net: enable TCP_NODELAY by default on all platforms #906

silverwind opened this issue Feb 20, 2015 · 42 comments
Labels
help wanted Issues that need assistance from volunteers or PRs that need help to proceed. net Issues and PRs related to the net subsystem. stalled Issues and PRs that are stalled.

Comments

@silverwind
Copy link
Contributor

Just saw nodejs/node-v0.x-archive#9235 and this might be a good addition for platform consistency for us too. I'm just not sure if the proposed patch is the best place to put the call.

@silverwind
Copy link
Contributor Author

I may have to dig deeper here. Windows for example doesn't default to TCP_NODELAY either, so it seems questionable if we would rely in platform defaults like the issue implies.

@silverwind silverwind changed the title Enable TCP_NODELAY by default on all platforms net: enable TCP_NODELAY by default on all platforms Feb 20, 2015
@meandmycode
Copy link

+1 seems like a bug, my initial thought was the docs should be changed but it seems like nodelay is highly recommended as default for today's networking. I would hazard a guess that os platforms not already doing this are playing it safe where as iojs can be more reactive to current practice.

@silverwind
Copy link
Contributor Author

Yes, Nagle's algorithm isn't really beneficial in today's latency-limited networks. I wonder if it's possible to test from JS whether it is enabled on a target host.

@mscdex
Copy link
Contributor

mscdex commented Feb 20, 2015

+1

@trevnorris trevnorris added the net Issues and PRs related to the net subsystem. label Feb 20, 2015
@brendanashworth
Copy link
Contributor

+1 from me too, I'd love to see this.

@bnoordhuis
Copy link
Member

The fact that the documentation says that TCP_NODELAY is enabled by default seems like a documentation bug to me. As far as I know, that has never been true.

I'm on the fence as to whether it's a good idea to enable it by default. I'd be more comfortable +1'ing that if I had more faith in the net module's capabilities of batching writes. That's not to say I object, just that I'm not sure if defaulting to TCP_NODELAY is an unequivocally good thing.

@silverwind
Copy link
Contributor Author

If we go through with it enabling it everywhere, we need an mechanism for the users to disable it globally too. I think ideally on a per-process basis, maybe something like process.setNoDelay().

The reason is simple: With net only exposing this option on single sockets, there no way every module is going to expose the option to its parent.

@silverwind
Copy link
Contributor Author

I think regardless if we enable it by default or not, a process.setNoDelay() could generally prove useful.

While it can be set at kernel-level, I think controlling it on a per-program basis is preferred. For example, nginx has the option to set it per-server.

@sam-github
Copy link
Contributor

@bnoordhuis I wonder if adding a net.setNoDelay() to allow the default to be changed would be useful. Users in the field who are trying to squeeze more performance out could report back if any of them actually find the feature useful. If noone finds it useful to change no delay globally, there's no reason to change the default.

@silverwind
Copy link
Contributor Author

A net.setNoDelay() was my first idea too, but if I understand this correctly, that would require the user to require('net') before anything else to work on all connections following the call. We probably don't want the require order to be significant.

Or would it be possible to apply TCP_NODELAY retroactively to all sockets of a program?

@piscisaureus
Copy link
Contributor

I generally dislike any type of "global" setting. You wanted to speed up your http server but without realizing the database driver just got slower.

Node should pick reasonable defaults (so maybe switch it on by default), behave consistently on all platforms (fix windows and/or aix), and be easy to configure (maybe setNoDelay is too difficult for http servers). Let's keep bikeshedding withing this parameter space.

@silverwind
Copy link
Contributor Author

As a first step, I'm trying to verify that setNoDelay is working when set on both client and server:

"use strict";
const net = require("net");
const port = 4000;
let timeClient, timeServer;

function time() {
    let hrtime = process.hrtime();
    return hrtime[0] * 1e9 + hrtime[1];
}

net.createServer(function(socket) {
    socket.setNoDelay(true); // correct?
    timeServer = time();
    socket.write("\0");
    socket.on("data", function () {
        console.log("Client -> Server " + ((time() - timeClient) / 1000).toFixed(0) + "µs");
    });
}).listen(port);

setInterval(function() {
    let socket = new net.Socket();
    socket.setNoDelay(true); // correct?
    socket.on("data", function () {
        console.log("Server -> Client " + ((time() - timeServer) / 1000).toFixed(0) + "µs");
        timeClient = time();
        socket.write("\0");
        socket.end();
    }).connect(port);
}, 200);

Is this the correct usage? @evanlucas maybe you can have a look, as you've dealt with these socket options recently.

edit: updated to measure both delays. also: i fail at math, it's µs :)
edit2: I probably need to send out writes way faster to notice the nagling.

@bnoordhuis
Copy link
Member

@silverwind net.Socket#setNoDelay() is a no-op when the socket isn't connected yet. Changing the call to socket.once('connect', socket.setNoDelay) should work around that.

@silverwind
Copy link
Contributor Author

Right, that should fix the client part. I'll focus my tests on the server part now.

@mscdex
Copy link
Contributor

mscdex commented Mar 6, 2015

@silverwind net.Socket#setNoDelay() is a no-op when the socket isn't connected yet. Changing the call to socket.once('connect', socket.setNoDelay) should work around that.

Wasn't this issue brought up before? I can't remember if that was fixed already or if it's still being worked on ...

EDIT: Still in progress.... #880

@silverwind
Copy link
Contributor Author

@mscdex #880

@silverwind
Copy link
Contributor Author

Much better test for the server: https://gist.github.com/silverwind/a7a702e05b3a69578f58

I see single byte packets on the wire now, so it's definitely working. @bnoordhuis care to elborate about your concerns about batching? I've yet to read through the code, but I assume TCP_CORK is involved? Any suggestion on how to verify/test the batching?

@bnoordhuis
Copy link
Member

@silverwind Not TCP_CORK, just smart coalescing of writes in libuv or io.js. Here's an example of what I mean:

var socket = /* ... */;
function f() { socket.write('x'); }
setImmediate(f);
setImmediate(f);

The two writes should ideally be folded into a single write(2) system call (and therefore a single TCP packet) but io.js is currently not nearly smart enough to pull that off.

@silverwind
Copy link
Contributor Author

One more thing I noticed in the test is that on the receiving end, the data events get batched heavily too. So even if we set TCP_NODELAY on the sender, there seems to be no way to get the bytes faster on the receiving end. I think maybe a packet event might be in order to really profit from TCP_NODELAY on both sides, if the OS exposes such a thing.

@bnoordhuis
Copy link
Member

That's expected. I think that if you run strace -e read on the receiving process, you'll find that several packets are sometimes read with a single system call. It's a good thing because system calls are expensive. :-)

@silverwind
Copy link
Contributor Author

Current status: I still intend to flip the switch here if there's no performance impact in doing so. Testing for it is the hard part.

@Trott
Copy link
Member

Trott commented Feb 17, 2016

@silverwind Is this still something that's being worked on?

@silverwind
Copy link
Contributor Author

Not currently working on it but if anyone wants to benchmark with nodelay, the patch in nodejs/node-v0.x-archive#9235 is a good start.

@Trott
Copy link
Member

Trott commented Feb 17, 2016

@silverwind Should this be closed? Or stay open and maybe tagged with help-wanted?

@silverwind silverwind added the help wanted Issues that need assistance from volunteers or PRs that need help to proceed. label Feb 18, 2016
@silverwind
Copy link
Contributor Author

I think it's definitely something we should strife to add. We just need to assess that it doesn't regress throughput too much, maybe by adding a benchmark to showcase the difference.

@Fishrock123
Copy link
Contributor

@nodejs/benchmarking could we run an acmeair tests on the benchmark machine with this enabled? Would that be sufficient?

@mhdawson
Copy link
Member

mhdawson commented Nov 8, 2016

I can look at running acmeair with nodelay set. Is this still the patch to be used ?

diff --git a/lib/net.js b/lib/net.js
index d353ff7..d655dd3 100644
--- a/lib/net.js
+++ b/lib/net.js
@@ -48,7 +48,12 @@ function createPipe() {
 // constructor for lazy loading
 function createTCP() {
   var TCP = process.binding('tcp_wrap').TCP;
-  return new TCP();
+  var handle = new TCP();
+  // The api indicates the default for noDelay is true. This is not
+  // true by default at the OS level on all platforms (ex AIX)
+  // so set it to ensure it is set at the OS level.
+  handle.setNoDelay(true);
+  return handle;
 }

@Fishrock123
Copy link
Contributor

cc @silverwind

@silverwind
Copy link
Contributor Author

Yes, that should be it. Keep in mind that the OS might already default to having it enabled, so you likely need to disable it in sysctl or similar to see any real effect.

@mhdawson
Copy link
Member

The current benchmarking setup only supports Linux x86. Does anybody on this discussion know if TCP_NODELAY is enabled/disabled by default on this platoform. I can search for the answer, but if anybody already know based on the past discussion it would same me some time.

@bnoordhuis
Copy link
Member

@mhdawson TCP_NODELAY is disabled by default on Linux and there is no sysctl to override that, as far as I'm aware. A quick check of net/ipv4/tcp.c seems to confirm that.

@Fishrock123
Copy link
Contributor

ping @mhdawson ^

@mhdawson
Copy link
Member

mhdawson commented Nov 15, 2016

Applied this patch: https://github.com/mhdawson/io.js/commit/eb83dc31db2f68a67473dc6fdbcd521994d2b446.patch

Initial results

BEFORE CHANGE: https://ci.nodejs.org/view/All/job/benchmark-footprint-experimental-TCP_NODELAY/1/

+ cat acmerun
+ grep metric throughput 
+ awk {print $3}
+ ACME_THROUGHPUT=2330.13
+ cat acmerun
+ grep metric latency
+ awk {print $3}
+ ACME_LATENCY=10.2826
+ cat acmerun
+ grep metric pre footprint
+ awk {print $4}
+ ACME_PREFOOTPRINT=103296
+ cat acmerun
+ grep metric post footprint
+ awk {print $4}
+ ACME_POSTFOOTPRINT=98368

AFTER CHANGE: https://ci.nodejs.org/view/All/job/benchmark-footprint-experimental-TCP_NODELAY/5/consoleFull

+ cat acmerun
+ grep metric throughput 
+ awk {print $3}
+ ACME_THROUGHPUT=2331.48
+ cat acmerun
+ grep metric latency
+ awk {print $3}
+ ACME_LATENCY=10.2697
+ cat acmerun
+ grep metric pre footprint
+ awk {print $4}
+ ACME_PREFOOTPRINT=105388
+ cat acmerun
+ + grepawk {print $4}
 metric post footprint
+ ACME_POSTFOOTPRINT=100800

assuming that it was off by default and that my patch properly turns it on where it matters, it does not seem to have affected latency or throughput to a noticeable degree. More runs would like be needed to confirm.

@silverwind
Copy link
Contributor Author

@mhdawson what packet size are you using in these tests? Can you try to vary them?

@mhdawson
Copy link
Member

mhdawson commented Feb 2, 2017

Do you have the command to vary the packet sizes from the command line ? If so I can launch some runs with the values you'd like.

@silverwind
Copy link
Contributor Author

On Linux, it should be ip link set dev eth0 mtu <bytes> where bytes is 0-1500.

@Trott
Copy link
Member

Trott commented Jul 16, 2017

Should this remain open?

@silverwind
Copy link
Contributor Author

I'd say so.

@BridgeAR
Copy link
Member

I just stumbled upon this and I know that using the NAGL algorithm has a huge impact in case you send very tiny buffer chunks and wait for the result as done in the node_redis benchmarks.

Results

// No NAGL
   SET 4B buf,         1/1 avg/max:   0.05/  3.07 2501ms total,   18224 ops/sec
// NAGL
   SET 4B buf,         1/1 avg/max:  44.10/ 47.97 2514ms total,      23 ops/sec

So I definitely think this is a good idea. I do not know of any negative side effects but I only checked this for node_redis.

@silverwind
Copy link
Contributor Author

@BridgeAR do you think you could come up with a test that verifies that it's enabled? This was the part that I was struggling with last time. Other than that, the change itself to enable it should be just setting the socket option.

@Fishrock123
Copy link
Contributor

ping @BridgeAR

@refack refack added the stalled Issues and PRs that are stalled. label Nov 15, 2018
@refack
Copy link
Contributor

refack commented Nov 15, 2018

Over a year with no update, so I'm going to close this.
Feel free to ping me if this should be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues that need assistance from volunteers or PRs that need help to proceed. net Issues and PRs related to the net subsystem. stalled Issues and PRs that are stalled.
Projects
None yet
Development

No branches or pull requests