Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Flaky CI tests #3036

Closed
achingbrain opened this issue May 14, 2020 · 6 comments
Closed

Flaky CI tests #3036

achingbrain opened this issue May 14, 2020 · 6 comments
Assignees
Labels
exp/intermediate Prior experience is likely helpful kind/bug A bug in existing code (including security flaws) kind/wontfix

Comments

@achingbrain
Copy link
Member

achingbrain commented May 14, 2020

There are several intermittent test failures in CI. Here are the most common, their causes and the issues which when closed will resolve them:

ipfs:   1) interface-ipfs-core over ipfs-http-client tests
ipfs:        .pubsub.peers
ipfs:          "after all" hook for "should return peers for a topic - multiple peers":
ipfs:      FetchError: request to http://127.0.0.1:44199/api/v0/shutdown failed, reason: read ECONNRESET
ipfs:       at ClientRequest.<anonymous> (/home/travis/build/ipfs/js-ipfs/node_modules/node-fetch/lib/index.js:1455:11)
ipfs:       at Socket.socketErrorListener (_http_client.js:401:9)
ipfs:       at emitErrorNT (internal/streams/destroy.js:91:8)
ipfs:       at emitErrorAndCloseNT (internal/streams/destroy.js:59:3)
ipfs:       at process._tickCallback (internal/process/next_tick.js:63:19)
ipfs: FAILED TESTS:
ipfs:   interface-ipfs-core tests
ipfs:     .pubsub.peers
ipfs:       ✖ "after all" hook for "should return peers for a topic - multiple peers"
ipfs:         Chrome Headless 81.0.4044.138 (Linux x86_64)
ipfs:       Error: Uncaught TypeError: Cannot assign to read only property 'code' of object '' (file:/home/travis/build/ipfs/js-ipfs/node_modules/simple-peer/index.js:26)
ipfs:  Command failed with exit code 1: karma start /home/travis/build/ipfs/js-ipfs/node_modules/aegir/src/config/karma.conf.js --files-custom --log-level error --bail true --timeout 5000

What's happening here?

We are running an interface test over HTTP to a remote js-IPFS node. While the test is being torn down, a libp2p operation is still in progress on the remote node, which throws.

See: libp2p/js-libp2p-tcp#130
And the browser version: libp2p/js-libp2p-webrtc-star#222
Spurious simple-peer error: feross/simple-peer#660


ipfs:   1) circuit
ipfs:        js-go-go
ipfs:          connect:
ipfs:      HTTPError: 
ipfs:     Error: MAC Invalid: 5f3b4afa2ed40725f40610a08bfea90bcf8ffa23199879c17ebd8dbe3d0175e7 != 9042b9bc86f918e1a03d7f1f0a6930dbb5cf8179088a7c517d1f304a2858f9bd
ipfs:         at /home/travis/build/ipfs/js-ipfs/node_modules/libp2p-secio/src/etm.js:33:15
ipfs:       at Object.errorHandler [as handleError] (/home/travis/build/ipfs/js-ipfs/packages/ipfs-http-client/src/lib/core.js:67:15)
ipfs:       at process._tickCallback (internal/process/next_tick.js:68:7)

What's happening here?

js-IPFS is rejecting connections from go-IPFS because the MAC of the incoming message is invalid.

See: libp2p/js-libp2p#310


Fails intermittently on webworkers for both Chrome & Firefox

ipfs: FAILED TESTS:
ipfs: interface-ipfs-core over ipfs-http-client tests against js-ipfs
ipfs: .swarm.peers
ipfs: ✖ should list peers only once even if they have multiple addresses
ipfs: Chrome Headless 87.0.4280.66 (Linux x86_64)
ipfs: HTTPError: Bad Request
ipfs: at HTTP.fetch (file:/home/travis/build/ipfs/js-ipfs/node_modules/ipfsd-ctl/node_modules/ipfs-utils/src/http.js:166:13)
ipfs: at async Client.start (file:/home/travis/build/ipfs/js-ipfs/node_modules/ipfsd-ctl/src/ipfsd-client.js:175:19)
ipfs: at async Factory.spawn (file:/home/travis/build/ipfs/js-ipfs/node_modules/ipfsd-ctl/src/factory.js:161:7)
ipfs: at async Context.eval (file:/home/travis/build/ipfs/js-ipfs/packages/interface-ipfs-core/src/swarm/peers.js:130:22)
ipfs: Command failed with exit code 1: karma start /home/travis/build/ipfs/js-ipfs/node_modules/aegir/src/config/karma.conf.js --files-custom

What's happening here?

Needs investigation

@achingbrain achingbrain added kind/bug A bug in existing code (including security flaws) exp/intermediate Prior experience is likely helpful labels May 14, 2020
@Gozala
Copy link
Contributor

Gozala commented Jun 18, 2020

@achingbrain @hugomrdias until these issues can be worked out, can we please consider making known intermediately failing tests non fatal ? Maybe they could still run but do not abort run and do not affect overall ❌ / ✅ conclusion ?

I understand that fixing actual tests end hands is the right way to go about things, but keeping tree red in the meantime doesn't seem good. Between various intermediate failures and bundle size checks and all I just can't get ✅ from CI, even though I'm not changing anything in existing code.

@achingbrain
Copy link
Member Author

I'd really rather not as some of these failures are caused by actual hard-to-replicate bugs. If we disable the tests they will never get fixed.

That said, I'm not 100% on bundle sizes failing the build as the size delta is valuable information but to date it's has never stopped us publishing anything.

@Gozala
Copy link
Contributor

Gozala commented Jun 22, 2020

I'd really rather not as some of these failures are caused by actual hard-to-replicate bugs. If we disable the tests they will never get fixed.

Can we consider alternative where they aren't skipped / disabled but are marked to be non fatal ? It's just unless tree is kept green it's impossible to tell when new regressions are introduced.

@achingbrain
Copy link
Member Author

If there are individual tests that fail intermittently then potentially. The problem with the remaining errors above is that they can occur at pretty much any time so you'd be ignoring failures from most of the codebase.

Are you seeing any tests that fail repeatedly?

@Gozala
Copy link
Contributor

Gozala commented Jun 22, 2020

I have disabled bunch of test in #3081 because they would fail, but restarting them on CI would pass. I also verified that they were intermittent failures with them on master.

If there are tests that fail even 30% of the time there’s no way of telling if they failure is regression or not. Furthermore it requires more coordination and attention from both reviewers and an author to make sure no new regressions are missed.

Which is to suggest that keeping master green as a policy might be a good idea. For the very least I think we should annotate failing teat with corresponding issues so at least from output one can tell if it’s new or something known

@achingbrain
Copy link
Member Author

Closing as this repo is deprecated

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
exp/intermediate Prior experience is likely helpful kind/bug A bug in existing code (including security flaws) kind/wontfix
Projects
No open projects
Status: Done
Development

No branches or pull requests

2 participants