Skip to content
This repository has been archived by the owner on Aug 1, 2023. It is now read-only.

IPNS Pubsub tests failing on master for linux #71

Closed
jacobheun opened this issue May 31, 2019 · 17 comments
Closed

IPNS Pubsub tests failing on master for linux #71

jacobheun opened this issue May 31, 2019 · 17 comments
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@jacobheun
Copy link
Contributor

The test is only failing on linux. @vasco-santos any thoughts on what's going on here?

ipns-pubsub
       should publish the received record to a go node and a js subscriber should receive it:
     Uncaught AssertionError: expected [Error: waitFor time expired] to not exist
      at series (test/ipns-pubsub.js:145:26)
      at /home/travis/build/ipfs/interop/node_modules/async/internal/parallel.js:39:9
      at /home/travis/build/ipfs/interop/node_modules/async/internal/once.js:12:16
      at iterateeCallback (node_modules/async/internal/eachOfLimit.js:45:17)
      at /home/travis/build/ipfs/interop/node_modules/async/internal/onlyOnce.js:12:16
      at /home/travis/build/ipfs/interop/node_modules/async/internal/parallel.js:36:13
      at Timeout.setInterval (test/utils/wait-for.js:21:14)

https://travis-ci.com/ipfs/interop/jobs/202899201

@jacobheun jacobheun added the kind/bug A bug in existing code (including security flaws) label May 31, 2019
@vasco-santos
Copy link
Member

I will have a look

@vasco-santos
Copy link
Member

I have not been able to reproduce with a linux machine. I will look at installing the linux os running in CI

@vasco-santos
Copy link
Member

So, I was finally able to debug the problem until finding out what hit these tests.

The PR with tests for adding and modifying mfs hamt shards ipfs/interop#54 bumped the cid module from version ~0.5.7 to ~0.7.1, which was the dependency that broke this test.

I looked into the progress of the cid module from ~0.5.7 version and the commit which broke this test was multiformats/js-cid/commit/0e94b5531c98b4bde87eef9ee7b76e06f8a4f6a7 in multiformats/js-cid#77.

I tested removing those validations with multiformats/js-cid/commit/32eaf3fea10bb13eec7c92e64eb90a02342dd73c and installed this commit with interop ipfs/interop#72 and the tests get green.

What is really strange in this bug? It only happens in Linux, not in Mac nor Windows, when the CID should be the same in both. In IPNS, a Key (from interface-datastore) is created from the B58String of the PeerId. Then, specifically to IPNS over pubsub, that is converted to base64url ipfs/js-datastore-pubsub/blob/master/src/utils.js#L14 for creating the topic.

Any ideas @jacobheun @olizilla @vmx ?

@vmx
Copy link
Member

vmx commented Jun 4, 2019

@vasco-santos The only idea I have is that some module has a dependency on a cids 0.5.x and it somehow bubbles up that CID. Those CIDs don't have a multibaseName field yet, which would make the other.multibaseName !== 'base58btc' check fail.

Could you check the packag-lock.json file to see if something is using an old version of cids (once you've run npm install a npm ls cids should do the trick).

If that's the problem, perhaps the Linux CI machine had some modules cached differently than the other VMs to make it fail.

@vasco-santos
Copy link
Member

vasco-santos commented Jun 4, 2019

I looked at it, and ipfsd-ctl has an older version as a result of an older ipfs-http-client. It is strange that it only happens on Linux, but I will update and test it

@Stebalien
Copy link
Member

It doesn't look like #73 fixed this.

@vasco-santos
Copy link
Member

I re-run a CI job and apparently it is failing now, but in mac. However, for my mac, it works fine. Anyone else can double check?

@Stebalien
Copy link
Member

It's failing locally on linux for me. Also on CircleCI.

@vasco-santos
Copy link
Member

@Stebalien to mislead potential factors, did you remove the package-lock.json file and reinstall the dependencies? I am not able to reproduce except for mac os on travis ci (linux on travis is ok). I restarted CI for #36

@Stebalien
Copy link
Member

Locally, no. However, this is failing in Circle (which I believe redeploys each time). I'll try again locally.

@Stebalien
Copy link
Member

Yep, still broken on master with a fresh npm install && npm run test. It's probably a timing issue.

@vasco-santos
Copy link
Member

I have not been able to reproduce except for mac ci, even on a Linux vm. @hugomrdias can you have a look?

@Stebalien Stebalien mentioned this issue Jul 12, 2019
51 tasks
@hugomrdias
Copy link
Member

hugomrdias commented Jul 16, 2019

for me this also fails in macos and seems to be a libp2p problem

 libp2p:floodsub:error Error: Dial is currently blacklisted for this peer
  libp2p:floodsub:error     at createError (/Users/hugomrdias/code/pl/interop/node_modules/err-code/index.js:4:44)
  libp2p:floodsub:error     at ERR_BLACKLISTED (/Users/hugomrdias/code/pl/interop/node_modules/libp2p-switch/src/errors.js:8:26)
  libp2p:floodsub:error     at Queue.add (/Users/hugomrdias/code/pl/interop/node_modules/libp2p-switch/src/dialer/queue.js:89:33)
  libp2p:floodsub:error     at DialQueueManager.add (/Users/hugomrdias/code/pl/interop/node_modules/libp2p-switch/src/dialer/queueManager.js:124:17)
  libp2p:floodsub:error     at _dial (/Users/hugomrdias/code/pl/interop/node_modules/libp2p-switch/src/dialer/index.js:37:22)
  libp2p:floodsub:error     at Switch.dial (/Users/hugomrdias/code/pl/interop/node_modules/libp2p-switch/src/dialer/index.js:95:5)
  libp2p:floodsub:error     at _getPeerInfo (/Users/hugomrdias/code/pl/interop/node_modules/libp2p/src/index.js:267:20)
  libp2p:floodsub:error     at Node._getPeerInfo (/Users/hugomrdias/code/pl/interop/node_modules/libp2p/src/get-peer-info.js:64:5)
  libp2p:floodsub:error     at Node.dialProtocol (/Users/hugomrdias/code/pl/interop/node_modules/libp2p/src/index.js:264:10)
  libp2p:floodsub:error     at FloodSub._dialPeer (/Users/hugomrdias/code/pl/interop/node_modules/libp2p-pubsub/src/index.js:149:17)
  libp2p:floodsub:error     at Node.emit (events.js:194:15)
  libp2p:floodsub:error     at Node.emit (/Users/hugomrdias/code/pl/interop/node_modules/libp2p/src/index.js:202:13)
  libp2p:floodsub:error     at Switch.Libp2p._switch.on (/Users/hugomrdias/code/pl/interop/node_modules/libp2p/src/index.js:83:14)
  libp2p:floodsub:error     at Switch.emit (events.js:194:15)
  libp2p:floodsub:error     at ConnectionManager.add (/Users/hugomrdias/code/pl/interop/node_modules/libp2p-switch/src/connection/manager.js:38:21)
  libp2p:floodsub:error     at conn.getPeerInfo (/Users/hugomrdias/code/pl/interop/node_modules/libp2p-switch/src/connection/manager.js:211:36)
  libp2p:floodsub:error     at f (/Users/hugomrdias/code/pl/interop/node_modules/once/once.js:25:25)
  libp2p:floodsub:error     at Connection.conn.getPeerInfo (/Users/hugomrdias/code/pl/interop/node_modules/libp2p-switch/src/connection/manager.js:193:13)
  libp2p:floodsub:error     at process._tickCallback (internal/process/next_tick.js:68:7) +0ms

this happens right before https://github.com/ipfs/interop/blob/master/test/ipns-pubsub.js#L91-L92

@vasco-santos
Copy link
Member

vasco-santos commented Jul 16, 2019

Ok, so I have thought about this and how we could get the CI green for the release now.

Thanks for the debug stack @hugomrdias , as I could not replicate this locally.

TLDR: we introduced a dial queue on js-libp2p-switch, which intends to (among other factors) blacklist abusive peers. In this test, and differently to the interop tests for pubsub, we had the default bootstrap list, as well as MDNS discovery enabled. This way, we were discovering the peer in several ways and trying to dial to it, while also dialing explicitly to it, in order to guarantee that each peer knows each other before the IPNS over pubsub tests. With this into account, the libp2p-switch was marking the peer as blacklisted and we were not being able to dial with it.

I created #77 which basically has an empty bootstrap list for the js node, and disables MDNS. This way, we will have the CI green in here and test the interop over the wire. Moreover, I will create an issue in js-libp2p-switch so that we rethink the way we blacklist nodes next week, once @jacobheun is back

@Stebalien
Copy link
Member

I'm still getting failures in go-ipfs:

https://circleci.com/gh/ipfs/go-ipfs/18238?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

The pin error looks unrelated.

@vasco-santos
Copy link
Member

Can you try after the package-lock.json PR gets in? I am clueless of why that is still possible to happen

@Stebalien
Copy link
Member

All fixed. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

5 participants