Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky: parallel/test-http2-* on Windows #20750

Closed
Trott opened this issue May 15, 2018 · 54 comments · Fixed by #37631
Closed

flaky: parallel/test-http2-* on Windows #20750

Trott opened this issue May 15, 2018 · 54 comments · Fixed by #37631
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. http2 Issues or PRs related to the http2 subsystem. windows Issues and PRs related to the Windows platform.

Comments

@Trott
Copy link
Member

Trott commented May 15, 2018

  • Version: v11.0.0-pre (master)
  • Platform: win2008r2-vs2017,
  • Subsystem: test http2
not ok 251 parallel/test-http2-pipe
  ---
  duration_ms: 0.323
  severity: fail
  exitcode: 1
  stack: |-
    (node:6548) ExperimentalWarning: The http2 module is an experimental API.
    events.js:167
          throw er; // Unhandled 'error' event
          ^
    
    Error: read ECONNRESET
        at TCP.onread (net.js:658:25)
    Emitted 'error' event at:
        at emitErrorNT (internal/streams/destroy.js:82:8)
        at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
        at process._tickCallback (internal/process/next_tick.js:63:19)
  ...

@nodejs/build @nodejs/platform-windows @nodejs/testing @nodejs/http2

[refack: edited info and added link to test file]

@Trott Trott added windows Issues and PRs related to the Windows platform. test Issues and PRs related to the tests. flaky-test Issues and PRs related to the tests with unstable failures on the CI. http2 Issues or PRs related to the http2 subsystem. labels May 15, 2018
@apapirovski
Copy link
Member

This happens as a result of the recent change to destroy the socket properly but it might be revealing an actual bug rather than just being a flaky test. That test does a pretty basic pipe operation, it shouldn't ever trigger an ECONNRESET.

ping @nodejs/http2

@apapirovski
Copy link
Member

This also just happened in parallel/test-http2-client-upload. Same root cause. It's definitely a bug.

https://ci.nodejs.org/job/node-test-binary-windows/17277/COMPILED_BY=vs2017,RUNNER=win2016,RUN_SUBSET=3/tapResults/

@Trott
Copy link
Member Author

Trott commented May 16, 2018

For posterity (since the link provided by @apapirovski will eventually 404):

not ok 231 parallel/test-http2-client-upload
  ---
  duration_ms: 0.220
  severity: fail
  exitcode: 1
  stack: |-
    (node:2492) ExperimentalWarning: The http2 module is an experimental API.
    events.js:167
          throw er; // Unhandled 'error' event
          ^
    
    Error: read ECONNRESET
        at TCP.onread (net.js:658:25)
    Emitted 'error' event at:
        at emitErrorNT (internal/streams/destroy.js:82:8)
        at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
        at process._tickCallback (internal/process/next_tick.js:63:19)
  ...

Host: test-azure_msft-win2016-x64-2

The fact that it's happening on both Azure and Rackspace, win2008 and win2016, seems to suggest (to me at least) a bug in Node.js or a dependency like libuv.

@apapirovski
Copy link
Member

Pretty sure it's because we use setImmediate to destroy rather than having some more certain method of confirming that everything has been sent and we're truly done. Hopefully someone else from the http2 team will have a look soon. I'm not on Windows and it's not a problem on macOS as far as I can tell.

@mcollina
Copy link
Member

The bug on pipe happens on a mac too.

@apapirovski
Copy link
Member

@mcollina I can't seem to replicate on mine :( This is what I run it with: tools/test.py -j 96 --repeat 960 parallel/test-http2-pipe.

@Trott
Copy link
Member Author

Trott commented May 16, 2018

I also can't replicate on macOS nor do I see instances looking through macOS failures in Ci. It may very well be a bug on macOS too, but if so, we're definitely seeing it with much greater frequency on Windows in CI.

@apapirovski
Copy link
Member

apapirovski commented May 16, 2018

@mcollina
Copy link
Member

=== release test-http2-pipe ===
Path: parallel/test-http2-pipe
(node:37337) ExperimentalWarning: The http2 module is an experimental API.
events.js:167
      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at TCP.onread (net.js:658:25)
Emitted 'error' event at:
    at emitErrorNT (internal/streams/destroy.js:82:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
    at process._tickCallback (internal/process/next_tick.js:63:19)
Command: out/Release/node /Users/matteo/Repositories/node/test/parallel/test-http2-pipe.js
[01:05|% 100|+ 938|-  22]: Done

On my system.

@apapirovski apapirovski added confirmed-bug Issues with confirmed bugs. and removed flaky-test Issues and PRs related to the tests with unstable failures on the CI. test Issues and PRs related to the tests. windows Issues and PRs related to the Windows platform. labels May 19, 2018
MylesBorins pushed a commit that referenced this issue May 22, 2018
Currently http2 does not properly submit GOAWAY frames when a session
is being destroyed. It also doesn't properly handle when the other
party severs the connection after sending a GOAWAY frame, even though
it should.

Edge, IE & Safari are currently unable to handle empty TRAILERS
frames despite them being correctly to spec. Instead send an empty
DATA frame with END_STREAM flag in those situations.

Fix and adjust several flaky and/or incorrect tests.

PR-URL: #20772
Fixes: #20705
Fixes: #20750
Fixes: #20850
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this issue May 22, 2018
It's possible for the connections to take too long and since the server
is already unrefed, the process will just exit. Instead adjust the test
so that server unref only happens after all sessions have been
successfuly established and unrefed. That still tests the same condition
but will not fail under load.

PR-URL: #20772
Fixes: #20705
Fixes: #20750
Fixes: #20850
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this issue May 23, 2018
Currently http2 does not properly submit GOAWAY frames when a session
is being destroyed. It also doesn't properly handle when the other
party severs the connection after sending a GOAWAY frame, even though
it should.

Edge, IE & Safari are currently unable to handle empty TRAILERS
frames despite them being correctly to spec. Instead send an empty
DATA frame with END_STREAM flag in those situations.

Fix and adjust several flaky and/or incorrect tests.

PR-URL: #20772
Fixes: #20705
Fixes: #20750
Fixes: #20850
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this issue May 23, 2018
It's possible for the connections to take too long and since the server
is already unrefed, the process will just exit. Instead adjust the test
so that server unref only happens after all sessions have been
successfuly established and unrefed. That still tests the same condition
but will not fail under load.

PR-URL: #20772
Fixes: #20705
Fixes: #20750
Fixes: #20850
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@Trott
Copy link
Member Author

Trott commented May 25, 2018

It's back. /ping @apapirovski

https://ci.nodejs.org/job/node-test-binary-windows/17579/COMPILED_BY=vs2017,RUNNER=win2008r2-vs2017,RUN_SUBSET=2/console

Host: test-rackspace-win2008r2-x64-4

not ok 252 parallel/test-http2-pipe
  ---
  duration_ms: 0.274
  severity: fail
  exitcode: 1
  stack: |-
    (node:5876) ExperimentalWarning: The http2 module is an experimental API.
    events.js:167
          throw er; // Unhandled 'error' event
          ^
    
    Error: read ECONNRESET
        at TCP.onread (net.js:657:25)
    Emitted 'error' event at:
        at emitErrorNT (internal/streams/destroy.js:82:8)
        at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
        at process._tickCallback (internal/process/next_tick.js:63:19)
  ...

Flarna added a commit to dynatrace-oss-contrib/node that referenced this issue Dec 18, 2019
Wait for close event on server stream before shuting down server and
client to avoid races seen on windows CI.

Refs: nodejs#20750 (comment)
BridgeAR pushed a commit that referenced this issue Dec 20, 2019
Wait for close event on server stream before shuting down server and
client to avoid races seen on windows CI.

Refs: #20750 (comment)

PR-URL: #29889
Refs: #29852
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
BridgeAR pushed a commit that referenced this issue Jan 3, 2020
Wait for close event on server stream before shuting down server and
client to avoid races seen on windows CI.

Refs: #20750 (comment)

PR-URL: #29889
Refs: #29852
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
targos pushed a commit that referenced this issue Jan 14, 2020
Wait for close event on server stream before shuting down server and
client to avoid races seen on windows CI.

Refs: #20750 (comment)

PR-URL: #29889
Refs: #29852
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
addaleax added a commit to addaleax/node that referenced this issue Feb 1, 2020
Alternative to nodejs#31590.

It appears that the issue here is that the test falsely assumed that
closing the client (which also currently destroys the socket rather
than gracefully shutting down the connection) would still leave
enough time for the server side to receive the stream error.

Address that by explicitly waiting for the server side to receive the
stream error before closing the client and the connection with it.

Refs: nodejs#31590
Refs: nodejs#20750
BethGriggs pushed a commit that referenced this issue Feb 6, 2020
Wait for close event on server stream before shuting down server and
client to avoid races seen on windows CI.

Refs: #20750 (comment)

PR-URL: #29889
Refs: #29852
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Trott pushed a commit to Trott/io.js that referenced this issue Feb 6, 2020
Alternative to nodejs#31590.

It appears that the issue here is that the test falsely assumed that
closing the client (which also currently destroys the socket rather
than gracefully shutting down the connection) would still leave
enough time for the server side to receive the stream error.

Address that by explicitly waiting for the server side to receive the
stream error before closing the client and the connection with it.

Refs: nodejs#31590
Refs: nodejs#20750

PR-URL: nodejs#31610
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Denys Otrishko <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
codebytere pushed a commit that referenced this issue Feb 17, 2020
Alternative to #31590.

It appears that the issue here is that the test falsely assumed that
closing the client (which also currently destroys the socket rather
than gracefully shutting down the connection) would still leave
enough time for the server side to receive the stream error.

Address that by explicitly waiting for the server side to receive the
stream error before closing the client and the connection with it.

Refs: #31590
Refs: #20750

PR-URL: #31610
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Denys Otrishko <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
codebytere pushed a commit that referenced this issue Mar 15, 2020
Alternative to #31590.

It appears that the issue here is that the test falsely assumed that
closing the client (which also currently destroys the socket rather
than gracefully shutting down the connection) would still leave
enough time for the server side to receive the stream error.

Address that by explicitly waiting for the server side to receive the
stream error before closing the client and the connection with it.

Refs: #31590
Refs: #20750

PR-URL: #31610
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Denys Otrishko <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
codebytere pushed a commit that referenced this issue Mar 17, 2020
Alternative to #31590.

It appears that the issue here is that the test falsely assumed that
closing the client (which also currently destroys the socket rather
than gracefully shutting down the connection) would still leave
enough time for the server side to receive the stream error.

Address that by explicitly waiting for the server side to receive the
stream error before closing the client and the connection with it.

Refs: #31590
Refs: #20750

PR-URL: #31610
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Denys Otrishko <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
codebytere pushed a commit that referenced this issue Mar 30, 2020
Alternative to #31590.

It appears that the issue here is that the test falsely assumed that
closing the client (which also currently destroys the socket rather
than gracefully shutting down the connection) would still leave
enough time for the server side to receive the stream error.

Address that by explicitly waiting for the server side to receive the
stream error before closing the client and the connection with it.

Refs: #31590
Refs: #20750

PR-URL: #31610
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Denys Otrishko <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Trott added a commit to Trott/io.js that referenced this issue Mar 6, 2021
It hasn't failed in a long time.

Refs: nodejs#20750
Trott added a commit to Trott/io.js that referenced this issue Mar 6, 2021
Trott added a commit to Trott/io.js that referenced this issue Mar 8, 2021
It hasn't failed for a long time.

Closes: nodejs#20750

PR-URL: nodejs#37631
Fixes: nodejs#20750
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Ruy Adorno <[email protected]>
@Trott Trott closed this as completed in cd4b095 Mar 8, 2021
danielleadams pushed a commit that referenced this issue Mar 16, 2021
It hasn't failed in a long time.

Refs: #20750

PR-URL: #37631
Fixes: #20750
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Ruy Adorno <[email protected]>
danielleadams pushed a commit that referenced this issue Mar 16, 2021
It hasn't failed for a long time.

Closes: #20750

PR-URL: #37631
Fixes: #20750
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Ruy Adorno <[email protected]>
targos pushed a commit that referenced this issue May 1, 2021
It hasn't failed in a long time.

Refs: #20750

PR-URL: #37631
Fixes: #20750
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Ruy Adorno <[email protected]>
targos pushed a commit that referenced this issue May 1, 2021
It hasn't failed for a long time.

Closes: #20750

PR-URL: #37631
Fixes: #20750
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Ruy Adorno <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. http2 Issues or PRs related to the http2 subsystem. windows Issues and PRs related to the Windows platform.
Projects
None yet
10 participants