Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBM i test failures (read ECONNRESET) #39683

Open
richardlau opened this issue Aug 6, 2021 · 23 comments
Open

IBM i test failures (read ECONNRESET) #39683

richardlau opened this issue Aug 6, 2021 · 23 comments
Labels
ibm i Issues and PRs related to the IBM i platform.

Comments

@richardlau
Copy link
Member

Starting with yesterday's daily build, node-test-commit-ibmi is seeing several test failures with Error: read ECONNRESET).

09:26:12 not ok 1148 parallel/test-http-client-parse-error
09:26:12   ---
09:26:12   duration_ms: 1.795
09:26:12   severity: fail
09:26:12   exitcode: 1
09:26:12   stack: |-
09:26:12     node:events:371
09:26:12           throw er; // Unhandled 'error' event
09:26:12           ^
09:26:12     
09:26:12     Error: read ECONNRESET
09:26:12         at TCP.onStreamRead (node:internal/stream_base_commons:220:20)
09:26:12     Emitted 'error' event on Socket instance at:
09:26:12         at emitErrorNT (node:internal/streams/destroy:164:8)
09:26:12         at emitErrorCloseNT (node:internal/streams/destroy:129:3)
09:26:12         at processTicksAndRejections (node:internal/process/task_queues:83:21) {
09:26:12       errno: -73,
09:26:12       code: 'ECONNRESET',
09:26:12       syscall: 'read'
09:26:12     }
09:26:12   ...
09:26:35 not ok 1249 parallel/test-http-multi-line-headers
09:26:35   ---
09:26:35   duration_ms: 1.710
09:26:35   severity: fail
09:26:35   exitcode: 1
09:26:35   stack: |-
09:26:35     node:events:371
09:26:35           throw er; // Unhandled 'error' event
09:26:35           ^
09:26:35     
09:26:35     Error: read ECONNRESET
09:26:35         at TCP.onStreamRead (node:internal/stream_base_commons:220:20)
09:26:35     Emitted 'error' event on Socket instance at:
09:26:35         at emitErrorNT (node:internal/streams/destroy:164:8)
09:26:35         at emitErrorCloseNT (node:internal/streams/destroy:129:3)
09:26:35         at processTicksAndRejections (node:internal/process/task_queues:83:21) {
09:26:35       errno: -73,
09:26:35       code: 'ECONNRESET',
09:26:35       syscall: 'read'
09:26:35     }
09:26:35   ...
09:27:00 not ok 1343 parallel/test-http-server-unconsume
09:27:00   ---
09:27:00   duration_ms: 1.628
09:27:00   severity: fail
09:27:00   exitcode: 1
09:27:00   stack: |-
09:27:00     node:events:371
09:27:00           throw er; // Unhandled 'error' event
09:27:00           ^
09:27:00     
09:27:00     Error: read ECONNRESET
09:27:00         at TCP.onStreamRead (node:internal/stream_base_commons:220:20)
09:27:00     Emitted 'error' event on Socket instance at:
09:27:00         at emitErrorNT (node:internal/streams/destroy:164:8)
09:27:00         at emitErrorCloseNT (node:internal/streams/destroy:129:3)
09:27:00         at processTicksAndRejections (node:internal/process/task_queues:83:21) {
09:27:00       errno: -73,
09:27:00       code: 'ECONNRESET',
09:27:00       syscall: 'read'
09:27:00     }
09:27:00   ...
09:27:08 not ok 1372 parallel/test-http-upgrade-advertise
09:27:08   ---
09:27:08   duration_ms: 1.634
09:27:08   severity: fail
09:27:08   exitcode: 1
09:27:08   stack: |-
09:27:08     node:events:371
09:27:08           throw er; // Unhandled 'error' event
09:27:08           ^
09:27:08     
09:27:08     Error: read ECONNRESET
09:27:08         at TCP.onStreamRead (node:internal/stream_base_commons:220:20)
09:27:08     Emitted 'error' event on Socket instance at:
09:27:08         at emitErrorNT (node:internal/streams/destroy:164:8)
09:27:08         at emitErrorCloseNT (node:internal/streams/destroy:129:3)
09:27:08         at processTicksAndRejections (node:internal/process/task_queues:83:21) {
09:27:08       errno: -73,
09:27:08       code: 'ECONNRESET',
09:27:08       syscall: 'read'
09:27:08     }
09:27:08   ...
09:32:01 not ok 2470 parallel/test-tls-client-mindhsize
09:32:01   ---
09:32:01   duration_ms: 1.993
09:32:01   severity: fail
09:32:01   exitcode: 1
09:32:01   stack: |-
09:32:01     (node:3922343) SecurityWarning: DH parameter is less than 2048 bits
09:32:01     (Use `node --trace-warnings ...` to show where the warning was created)
09:32:01     node:events:371
09:32:01           throw er; // Unhandled 'error' event
09:32:01           ^
09:32:01     
09:32:01     Error: read ECONNRESET
09:32:01         at TLSWrap.onStreamRead (node:internal/stream_base_commons:220:20)
09:32:01     Emitted 'error' event on TLSSocket instance at:
09:32:01         at emitErrorNT (node:internal/streams/destroy:164:8)
09:32:01         at emitErrorCloseNT (node:internal/streams/destroy:129:3)
09:32:01         at processTicksAndRejections (node:internal/process/task_queues:83:21) {
09:32:01       errno: -73,
09:32:01       code: 'ECONNRESET',
09:32:01       syscall: 'read'
09:32:01     }
09:32:01   ...
09:32:40 not ok 2621 parallel/test-tls-write-error
09:32:40   ---
09:32:40   duration_ms: 1.874
09:32:40   severity: fail
09:32:40   exitcode: 1
09:32:40   stack: |-
09:32:40     node:events:371
09:32:40           throw er; // Unhandled 'error' event
09:32:40           ^
09:32:40     
09:32:40     Error: read ECONNRESET
09:32:40         at TCP.onStreamRead (node:internal/stream_base_commons:220:20)
09:32:40     Emitted 'error' event on TestTLSSocket instance at:
09:32:40         at emitErrorNT (node:internal/streams/destroy:164:8)
09:32:40         at emitErrorCloseNT (node:internal/streams/destroy:129:3)
09:32:40         at processTicksAndRejections (node:internal/process/task_queues:83:21) {
09:32:40       errno: -73,
09:32:40       code: 'ECONNRESET',
09:32:40       syscall: 'read'
09:32:40     }
09:32:40   ...
09:47:32 not ok 3129 pummel/test-regress-GH-892
09:47:32   ---
09:47:32   duration_ms: 3.333
09:47:32   severity: fail
09:47:32   exitcode: 1
09:47:32   stack: |-
09:47:32     expecting 33554432 bytes
09:47:32     .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................DONE
09:47:32     node:events:371
09:47:32           throw er; // Unhandled 'error' event
09:47:32           ^
09:47:32     
09:47:32     Error: read ECONNRESET
09:47:32         at TLSWrap.onStreamRead (node:internal/stream_base_commons:220:20)
09:47:32     Emitted 'error' event on ClientRequest instance at:
09:47:32         at TLSSocket.socketErrorListener (node:_http_client:447:9)
09:47:32         at TLSSocket.emit (node:events:394:28)
09:47:32         at emitErrorNT (node:internal/streams/destroy:164:8)
09:47:32         at emitErrorCloseNT (node:internal/streams/destroy:129:3)
09:47:32         at processTicksAndRejections (node:internal/process/task_queues:83:21) {
09:47:32       errno: -73,
09:47:32       code: 'ECONNRESET',
09:47:32       syscall: 'read'
09:47:32     }
09:47:32     got 33554432 bytes
09:47:32     node:assert:123
09:47:32       throw new AssertionError(obj);
09:47:32       ^
09:47:32     
09:47:32     AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
09:47:32     
09:47:32     1 !== 0
09:47:32     
09:47:32         at ChildProcess.<anonymous> (/home/IOJS/build/workspace/node-test-commit-ibmi/nodes/ibmi73-ppc64/test/pummel/test-regress-GH-892.js:62:12)
09:47:32         at ChildProcess.emit (node:events:394:28)
09:47:32         at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12) {
09:47:32       generatedMessage: true,
09:47:32       code: 'ERR_ASSERTION',
09:47:32       actual: 1,
09:47:32       expected: 0,
09:47:32       operator: 'strictEqual'
09:47:32     }
09:47:32   ...

Earlier builds (e.g. https://ci.nodejs.org/job/node-test-commit-ibmi/466/) are passing -- the source change between the last passing build and the failing builds is c61870c (the libuv 1.42.0 update). For the record, the libuv builds (libuv-test-commit-ibmi and libuv-test-commit-ibmi-cmake) have been passing.

Both the failing builds were on test-iinthecloud-ibmi73-ppc64_be-1. I've started a new build (in progress) on test-iinthecloud-ibmi73-ppc64_be-2 to check whether it's host specific: https://ci.nodejs.org/job/node-test-commit-ibmi/469/nodes=ibmi73-ppc64/

cc @nodejs/platform-ibmi

@richardlau richardlau added the ibm i Issues and PRs related to the IBM i platform. label Aug 6, 2021
@richardlau
Copy link
Member Author

@lpinca
Copy link
Member

lpinca commented Aug 7, 2021

@richardlau see #39525 (comment). Can you try to rerun the tests with #36111 applied?

@richardlau
Copy link
Member Author

@lpinca
Copy link
Member

lpinca commented Aug 7, 2021

cc: @vtjnash

@lpinca
Copy link
Member

lpinca commented Aug 7, 2021

Another possible culprit is 0e841b4. Was it included in the build?

@richardlau
Copy link
Member Author

richardlau commented Aug 7, 2021

https://ci.nodejs.org/job/node-test-commit-ibmi/471/nodes=ibmi73-ppc64/ was #36111 rebased on top of 822f9ff.

https://ci.nodejs.org/job/node-test-commit-ibmi/471/nodes=ibmi73-ppc64/consoleFull

02:33:29 HEAD detached at ebea6f5369
02:33:29 nothing to commit, working tree clean
02:33:29 ++ git rev-parse HEAD
02:33:29 ebea6f5369e056169f545713e4cc8b03ec402755
02:33:29 ++ git rev-parse origin/master
02:33:29 822f9ff4e6f819cfcf37c043bfddb441197f0723
02:33:29 ++ '[' -n origin/master ']'
02:33:29 ++ git rebase --committer-date-is-author-date origin/master
02:33:33 First, rewinding head to replay your work on top of it...
02:33:36 Applying: TLS: improve handling of shutdown
02:33:36 Applying: Apply suggestions from code review

@RaisinTen
Copy link
Contributor

@richardlau I think reverting libuv/libuv#3006 might fix the problem as the error is coming from line 220 here and this has something to do with EOF:

if (nread !== UV_EOF) {
// CallJSOnreadMethod expects the return value to be a buffer.
// Ref: https://github.com/nodejs/node/pull/34375
stream.destroy(errnoException(nread, 'read'));
return;
}

@vtjnash
Copy link
Contributor

vtjnash commented Aug 27, 2021

libuv/libuv#3006 may be likely to exacerbate existing race conditions in nodejs (since it surfaces attempted writes-after-shutdown as errors somewhat quicker), but it seems unlikely to cause them.

Looking at parallel/test-http-client-parse-error, this test server appears to violate the TCP protocol (valid, as that is the point of the test), which is causing the client to crash (invalid, as it means the test is correctly determining that nodejs is broken here). This test defines a TCP server which does not read any of the incoming data, which means the TCP spec requires that the kernel send a ECONNRESET packet (exactly what we sometimes see happen there). However, that server also immediately writes data then calls shutdown. Thus, depending on which side (client or server) can buffer their data faster and move that data over the localhost network, we expect that the test should sometimes pass (if the server sees TCP FIN first) and sometimes fail (if the client sees ECONNRESET first), until the underlying nodejs bug is fixed.

The nodejs failure to handle this situation correctly in the client seems related to be #39363, though in the http stack instead of the TLS stack. My fix #36111 is somewhat related, but only for the happy path, and doesn't fix the error path in nodejs (which is what we see triggering failures in the tests above).

mhdawson added a commit to mhdawson/io.js that referenced this issue Feb 1, 2022
Refs: nodejs#39683

These are being worked, but we really should have
marked flaky a long time ago in ordert to make
then nightlies non-red.

Signed-off-by: Michael Dawson <[email protected]>
mhdawson added a commit that referenced this issue Feb 1, 2022
Refs: #39683

These are being worked, but we really should have
marked flaky a long time ago in ordert to make
then nightlies non-red.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #41812
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Mohammed Keyvanzadeh <[email protected]>
@mhdawson
Copy link
Member

mhdawson commented Feb 1, 2022

PR to exclude failures until we get them resolved - #41812

ruyadorno pushed a commit that referenced this issue Feb 8, 2022
Refs: #39683

These are being worked, but we really should have
marked flaky a long time ago in ordert to make
then nightlies non-red.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #41812
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Mohammed Keyvanzadeh <[email protected]>
richardlau added a commit to richardlau/node-1 that referenced this issue Feb 15, 2022
Correct the names of two tests that have been marked `FLAKY` on IBM i
so they will actually be marked as such by the test runner.

Refs: nodejs#41812
Refs: nodejs#39683
nodejs-github-bot pushed a commit that referenced this issue Feb 15, 2022
Correct the names of two tests that have been marked `FLAKY` on IBM i
so they will actually be marked as such by the test runner.

Refs: #41812
Refs: #39683

PR-URL: #41984
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Mestery <[email protected]>
Reviewed-By: Beth Griggs <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
@V-for-Vasili
Copy link
Contributor

Applying #36111, reverting libuv/libuv#3006 or 0e841b4 does not solve the issue on my local machine;

The failure appears to be related to a known issue on IBMi when calling shutdown(SHUT_WR) on a TCP socket causes read() / recv() operations to fail with ECONNRESET.

In this case ECONNRESET comes from read() call in static void uv__read(uv_stream_t* stream) here.

I'm looking into whether there are any drawbacks to calling uv__stream_eof to handle this (The way it is done in CYGWIN and MSYS).

@vtjnash
Copy link
Contributor

vtjnash commented Feb 16, 2022

The failures here are a few examples of a general issue with nodejs, not specifically related to any particular platform, but because the http and http2 servers are not implemented robustly against malicious remote clients: #39363

My PR (#36111) fixed a similar bug in the TLS handling, but is not related to this issue (which fails also without TLS).

bengl pushed a commit to bengl/node that referenced this issue Feb 21, 2022
Correct the names of two tests that have been marked `FLAKY` on IBM i
so they will actually be marked as such by the test runner.

Refs: nodejs#41812
Refs: nodejs#39683

PR-URL: nodejs#41984
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Mestery <[email protected]>
Reviewed-By: Beth Griggs <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
bengl pushed a commit to bengl/node that referenced this issue Feb 21, 2022
Correct the names of two tests that have been marked `FLAKY` on IBM i
so they will actually be marked as such by the test runner.

Refs: nodejs#41812
Refs: nodejs#39683

PR-URL: nodejs#41984
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Mestery <[email protected]>
Reviewed-By: Beth Griggs <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
bengl pushed a commit that referenced this issue Feb 21, 2022
Correct the names of two tests that have been marked `FLAKY` on IBM i
so they will actually be marked as such by the test runner.

Refs: #41812
Refs: #39683

PR-URL: #41984
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Mestery <[email protected]>
Reviewed-By: Beth Griggs <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
bengl pushed a commit that referenced this issue Feb 22, 2022
Correct the names of two tests that have been marked `FLAKY` on IBM i
so they will actually be marked as such by the test runner.

Refs: #41812
Refs: #39683

PR-URL: #41984
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Mestery <[email protected]>
Reviewed-By: Beth Griggs <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
@V-for-Vasili
Copy link
Contributor

According to conversation in libuv/libuv#3494, this seems to be a node bug, meaning that node http implementation should be resilient to unexpected RST from a TCP connection (and clients that violate TCP protocol in general), and not crash the process on such instances.

In this particular case, UV_ECONNRESET is passed to read_cb by libuv as designed, and should be handled on the caller side (possibly in https://github.com/nodejs/node/blob/master/lib/internal/stream_base_commons.js#L167).

This problem appears to affect all platforms; The shutdown(WRITE_WR) thing on Ibmi just makes it more frequent and reproducible in these tests;

Looking into whether a contained solution affecting Ibmi only is possible on the node side.

@V-for-Vasili
Copy link
Contributor

Trying to create a test case that would reliably reproduce the problem on all platforms;

I have read through #36111, #27916 and the most reproducible test that shows the effect of unexpected TCP RST seems to be #27916 (comment), (which fails on OSX and Windows reliably).

The minimal test could involve a simple server and a client that sends RST right after connecting;
From my attempts I have seen that a familiar error occurs:

server.js:

const net = require('net');
const server = net.createServer(function(socket) {
  socket.on('data', function(chunk) {
    console.log(`Data: ${chunk.toString()}`);
  });
  socket.on('end', function() {
    console.log('End');
  });
  socket.write('HTTP/1.1 200 OK\n\nhello world!\n');
  socket.end(function(err) {
    if (err) console.log(`socket.end: ${err}`);
  });
});
server.listen(8082, function() {
  console.log('Server listening');
});

Error message:

      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at TCP.onStreamRead (node:internal/stream_base_commons:217:20)
Emitted 'error' event on Socket instance at:
    at emitErrorNT (node:internal/streams/destroy:164:8)
    at emitErrorCloseNT (node:internal/streams/destroy:129:3)
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {
  errno: -104,
  code: 'ECONNRESET',
  syscall: 'read'
}

Node.js v18.0.0-pre

I don't think a contained solution that would take care of IBMi is possible in node.

danielleadams pushed a commit that referenced this issue Mar 2, 2022
Refs: #39683

These are being worked, but we really should have
marked flaky a long time ago in ordert to make
then nightlies non-red.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #41812
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Mohammed Keyvanzadeh <[email protected]>
@mhdawson
Copy link
Member

mhdawson commented Mar 2, 2022

ok, IBM i was orange again today so I think the reason I opened this has been addressed. The test still fails but is excluded.

@V-for-Vasili I don't quite understand this comment - I don't think a contained solution that would take care of IBMi is possible in node.

@V-for-Vasili
Copy link
Contributor

@mhdawson Apologies! Let me take that back; Still looking for such solution atm.

To summarize relevant info above:
In these tests on Ibmi the read() call on TCP handle receives ECONNRESET in Libuv (due to shutdown(SHUT_WR) bug described above) and passes UV_ECONNRESET code to node.js callback (as it is supposed to), which crashes the tests.

The issue of properly handling unexpected TCP RST packets (responsible for ECONNRESET errors) in Node http implementation has been brought up #36180, #27916, but there does not seem to be a consensus on what to do about it yet. As far as I can tell, fixing this would be a major change on the node side and affect all platforms.

@mhdawson
Copy link
Member

mhdawson commented Mar 2, 2022

@V-for-Vasili thanks, that helps me understand better.

danielleadams pushed a commit that referenced this issue Mar 3, 2022
Refs: #39683

These are being worked, but we really should have
marked flaky a long time ago in ordert to make
then nightlies non-red.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #41812
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Mohammed Keyvanzadeh <[email protected]>
danielleadams pushed a commit that referenced this issue Mar 14, 2022
Refs: #39683

These are being worked, but we really should have
marked flaky a long time ago in ordert to make
then nightlies non-red.

Signed-off-by: Michael Dawson <[email protected]>

PR-URL: #41812
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Mohammed Keyvanzadeh <[email protected]>
@V-for-Vasili
Copy link
Contributor

Update: Patch to resolve the test-dgram-connect test (marked FLAKY alongside the rest) has landed: libuv/libuv#3561.

TCP stack behavior on IBMi 7.3,7.4 is responsible for the rest of the failures that have to do with ECONNRESET. I am talking to networking team about the possibility of fixing it in a PTF sometime in the future. There is no timeline yet - will post updates here when more is known.

danielleadams pushed a commit to danielleadams/node that referenced this issue Apr 21, 2022
Correct the names of two tests that have been marked `FLAKY` on IBM i
so they will actually be marked as such by the test runner.

Refs: nodejs#41812
Refs: nodejs#39683

PR-URL: nodejs#41984
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Mestery <[email protected]>
Reviewed-By: Beth Griggs <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
danielleadams pushed a commit that referenced this issue Apr 24, 2022
Correct the names of two tests that have been marked `FLAKY` on IBM i
so they will actually be marked as such by the test runner.

Refs: #41812
Refs: #39683

PR-URL: #41984
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Mestery <[email protected]>
Reviewed-By: Beth Griggs <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
@V-for-Vasili
Copy link
Contributor

Update: Applying the following PTFs resolves the underlying TCP stack behavior and will make the tests pass:

IBM i 7.2 - MF69728 and MF69702
IBM i 7.3 - MF69727 and MF69652
IBM i 7.4 - MF69729 and MF69730
IBM i 7.5 - MF69723 and MF69703

@richardlau

@vtjnash
Copy link
Contributor

vtjnash commented May 9, 2022

The description for MF69723 looks good to me as a fix for this issue (ECONNRESET should not be set for close). MF69703 sounds to me like a new violation of the TCP standard (RST packets are supposed to destroy the read queue, superseding and discarding any waiting data).

@richardlau
Copy link
Member Author

Update: Applying the following PTFs resolves the underlying TCP stack behavior and will make the tests pass:

IBM i 7.2 - MF69728 and MF69702
IBM i 7.3 - MF69727 and MF69652
IBM i 7.4 - MF69729 and MF69730
IBM i 7.5 - MF69723 and MF69703

@ThePrez Can you take care of applying the PTFs onto the CI machines?

@ThePrez
Copy link
Contributor

ThePrez commented May 9, 2022

Yes. Will do!

@V-for-Vasili
Copy link
Contributor

The description for MF69723 looks good to me as a fix for this issue (ECONNRESET should not be set for close). MF69703 sounds to me like a new violation of the TCP standard (RST packets are supposed to destroy the read queue, superseding and discarding any waiting data).

I have received some feedback and clarification to the above from the team that worked on both PTFs.

According to them, the change made in MF69703 does not violate the TCP standard: The sockets receive code path was updated to return valid data queued on the socket, not the TCP receive queue. If there is pending data at the TCP layer when the RST is received, it is handled according to RFC 793.

@vtjnash
Copy link
Contributor

vtjnash commented May 17, 2022

I believe that RFC 793 specifies that all segment queues should be flushed and any outstanding RECEIVES should report a reset error instead:

    If the RST bit is set then, any outstanding RECEIVEs and SEND
    should receive "reset" responses.  All segment queues should be
    flushed.  Users should also receive an unsolicited general
    "connection reset" signal.  Enter the CLOSED state, delete the
    TCB, and return.

After MF69703, the code is now not deleting the TCB nor flushing the incoming queue, and is continuing to process outstanding RECIEVEs. Either way though, the client application should eventually receive the error once the queue is drained, so it is not a significant issue, but it might be different than how I interpret other platforms would handle this.

nodejs-github-bot pushed a commit that referenced this issue Aug 14, 2022
Mark `test-http-pipeline-requests-connection-leak` flaky on IBM i.

PR-URL: #44215
Refs: #43509
Refs: #39683
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
danielleadams pushed a commit that referenced this issue Aug 16, 2022
Mark `test-http-pipeline-requests-connection-leak` flaky on IBM i.

PR-URL: #44215
Refs: #43509
Refs: #39683
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
ruyadorno pushed a commit that referenced this issue Aug 23, 2022
Mark `test-http-pipeline-requests-connection-leak` flaky on IBM i.

PR-URL: #44215
Refs: #43509
Refs: #39683
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
targos pushed a commit that referenced this issue Sep 5, 2022
Mark `test-http-pipeline-requests-connection-leak` flaky on IBM i.

PR-URL: #44215
Refs: #43509
Refs: #39683
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
Fyko pushed a commit to Fyko/node that referenced this issue Sep 15, 2022
Mark `test-http-pipeline-requests-connection-leak` flaky on IBM i.

PR-URL: nodejs#44215
Refs: nodejs#43509
Refs: nodejs#39683
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
juanarbol pushed a commit that referenced this issue Oct 10, 2022
Mark `test-http-pipeline-requests-connection-leak` flaky on IBM i.

PR-URL: #44215
Refs: #43509
Refs: #39683
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
juanarbol pushed a commit that referenced this issue Oct 11, 2022
Mark `test-http-pipeline-requests-connection-leak` flaky on IBM i.

PR-URL: #44215
Refs: #43509
Refs: #39683
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
guangwong pushed a commit to noslate-project/node that referenced this issue Jan 3, 2023
Mark `test-http-pipeline-requests-connection-leak` flaky on IBM i.

PR-URL: nodejs/node#44215
Refs: nodejs/node#43509
Refs: nodejs/node#39683
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
guangwong pushed a commit to noslate-project/node that referenced this issue Jan 3, 2023
Mark `test-http-pipeline-requests-connection-leak` flaky on IBM i.

PR-URL: nodejs/node#44215
Refs: nodejs/node#43509
Refs: nodejs/node#39683
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ibm i Issues and PRs related to the IBM i platform.
Projects
None yet
Development

No branches or pull requests

7 participants