Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Nodes disconnect - can't retrieve files after period of time #920

Closed
lindybrits opened this issue Jul 22, 2017 · 11 comments
Closed

Nodes disconnect - can't retrieve files after period of time #920

lindybrits opened this issue Jul 22, 2017 · 11 comments
Labels
kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up status/ready Ready to be worked

Comments

@lindybrits
Copy link

lindybrits commented Jul 22, 2017

We have three servers running js-ipfs via node.js applications. As soon as the applications start, the servers can retrieve files from one another. After a couple of minutes, however, it seems as though they disconnect as files can't be pulled. The time is takes for this issue to arise ranges anything from 10 minutes to 30 minutes, it is not quite clear.

Our config contains addresses in the form of -> /ip4/ip.address./tcp/4002/ipfs/Qm... Server 1 and server 2 represents the other two nodes the node should connect to.

Our config is as follows:

Addresses: {
  Swarm: [
    "/ip4/0.0.0.0/tcp/4002",
    "/ip4/server1.ip.address/tcp/4002/ipfs/Qm...",
    "/ip4/server2.ip.address/tcp/4002/ipfs/Qm..."
  ]
},
Bootstrap: [
  "/ip4/server1.ip.address/tcp/4002/ipfs/Qm...",
  "/ip4/server2.ip.address/tcp/4002/ipfs/Qm..."
]

Any help would be appreciated!

@daviddias
Copy link
Member

That's odd. Do you see any error in the logs?

Note, seems that you are adding addresses to connect on the addresses to listen (Addresses.Swarm). Your config should look like:

Addresses: {
  Swarm: [
    "/ip4/0.0.0.0/tcp/4002"
  ]
},
Bootstrap: [
  "/ip4/server1.ip.address/tcp/4002/ipfs/Qm...",
  "/ip4/server2.ip.address/tcp/4002/ipfs/Qm..."
]

@daviddias
Copy link
Member

@lindybrits just pushed a change to railing that should solve your issue.

To try it out, make sure to:

  • rm -r node_modules
  • npm install all the modules again
  • Use the latest js-ipfs (npm install ipfs/js-ipfs)

@lindybrits
Copy link
Author

@diasdavid thanks for the speedy feedback as always! :) I'll give it all a go and get back to you on the progress.

@lindybrits
Copy link
Author

@diasdavid I am still experiencing the issue of not being able to retrieve files after a certain amount of time. I do run the ipfs.isOnline() command and afterward I do ipfs.object.get(hash, {enc: "base58"}, (err, node) => { ... }). It does not show an error, however. It seems like it is stuck inside this function without returning the file.

I have changed the config to contain "/ip4/0.0.0.0/tcp/4002" in the Swarm and only the other nodes in Bootstrap. I have also run the three commands as suggested.

@lindybrits
Copy link
Author

Just to make extra sure - in the config file, does this apply (see the Qm... part):

Addresses: {
Swarm: [
"/ip4/0.0.0.0/tcp/4002"
]
},
Bootstrap: [
"/ip4/server1.ip.address/tcp/4002/ipfs/Qm...of server1",
"/ip4/server2.ip.address/tcp/4002/ipfs/Qm...of server2"
]

@lindybrits
Copy link
Author

Please have a look at this strange output for peers in js-ipfs in the file attached. The situation here is we have three servers (Server 2, Server 3, Server 4). What I have recorded is the peers printed in js-ipfs for Server 3 and Server 4.

Is this normal?

Server Peers IPFS.txt

@daviddias
Copy link
Member

@lindybrits how are you generating that list? Do you use ipfs.swarm.peers?

Could you also share the commit version you are using for your js-ipfs nodes?

One trick that helps to debug is enabling logs by:

» DEBUG=jsipfs:*,libp2p* jsipfs daemon

You should start seeing something like:

Initializing daemon...
  jsipfs:http-api starting +0ms
  jsipfs:state -> stopped +69ms
  jsipfs:state -> starting +1ms
  libp2p:swarm:dialer create: 8 peer limit, 10000 dial timeout +10ms
  libp2p:swarm:transport adding TCP +3ms
  libp2p:swarm:transport adding WebSockets +0ms
  libp2p:tcp:listen Listening on 4002 0.0.0.0 +4ms
Swarm listening on /libp2p-webrtc-star/dns4/star-signal.cloud.ipfs.team/wss/ipfs/QmVD9UTTvq7XpfmiAg4EooFZrfESxbpigiKqrpBQZQ3i4Q
Swarm listening on /ip4/127.0.0.1/tcp/4003/ws/ipfs/QmVD9UTTvq7XpfmiAg4EooFZrfESxbpigiKqrpBQZQ3i4Q
Swarm listening on /ip4/127.0.0.1/tcp/4002/ipfs/QmVD9UTTvq7XpfmiAg4EooFZrfESxbpigiKqrpBQZQ3i4Q
Swarm listening on /ip4/192.168.86.41/tcp/4002/ipfs/QmVD9UTTvq7XpfmiAg4EooFZrfESxbpigiKqrpBQZQ3i4Q
Swarm listening on /ip4/172.16.153.1/tcp/4002/ipfs/QmVD9UTTvq7XpfmiAg4EooFZrfESxbpigiKqrpBQZQ3i4Q
Swarm listening on /ip4/192.168.43.1/tcp/4002/ipfs/QmVD9UTTvq7XpfmiAg4EooFZrfESxbpigiKqrpBQZQ3i4Q
Swarm listening on /ip4/172.18.10.46/tcp/4002/ipfs/QmVD9UTTvq7XpfmiAg4EooFZrfESxbpigiKqrpBQZQ3i4Q
  jsipfs:state -> running +16ms
  jsipfs:http-api fetching config +0ms
API is listening on: /ip4/127.0.0.1/tcp/5002
Gateway (readonly) is listening on: /ip4/127.0.0.1/tcp/9090
  jsipfs:http-api done null +163ms
Daemon is ready
  libp2p:swarm:dial dialing QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ +10s
  libp2p:swarm:transport dialing TCP [ '/ip4/104.131.131.82/tcp/4001/ipfs/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ' ] +1ms

@daviddias daviddias added kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up labels Jul 23, 2017
@Jacodelange
Copy link

Jacodelange commented Jul 24, 2017

@diasdavid I am working with Lindy on this.

The version that we are using is "[email protected]".

This is the logs for node 2 is:

  jsipfs:state -> starting +39ms
  libp2p:swarm:dialer create: 8 peer limit, 10000 dial timeout +11ms
  libp2p:swarm:transport adding TCP +7ms
  libp2p:tcp:listen Listening on 4002 0.0.0.0 +12ms
  libp2p:swarm:transport adding WebSockets +11ms
Swarm listening on /ip4/127.0.0.1/tcp/4002/ipfs/Qm...of server2
Swarm listening on /ip4/10.0.1.11/tcp/4002/ipfs/Qm...of server2
  jsipfs:state -> running +12ms
  libp2p:tcp:listen new connection /ip4/server1.ip.address/tcp/59038 +4s
  libp2p:secio 1. propose - start +18ms
  libp2p:secio 1. propose - writing proposal +0ms
  libp2p:secio 1. propose - reading proposal <Buffer 0a 10 bd 09 d1 dd 44 df 0b 45 79 1d 3f 0b 45 c4 2e 66 12 ab 02 08 00 12 a6 02 30 82 01 22 30 0d 06 09 2a 86 48 86 f7 0d 01 01 01 05 00 03 82 01 0f 00 ... > +2ms
  libp2p:secio 1.1 identify +5ms
  libp2p:secio 1.1 identify - Qm...of server2 - identified remote peer as Qm...of server1 +7ms
  libp2p:secio 1.2 selection +3ms
  libp2p:secio 1. propose - finish +2ms
  libp2p:secio 2. exchange - start +1ms
  libp2p:secio 2. exchange - writing exchange +0ms
  libp2p:secio 2. exchange - reading exchange +28ms
  libp2p:secio 2.1. verify +0ms
  libp2p:secio 2.1. verify - signature verified +1ms
  libp2p:secio 2.2. keys +7ms
  libp2p:secio 2.3. mac + cipher +3ms
  libp2p:secio 2. exchange - finish +1ms
  libp2p:secio 3. finish - start +1ms
  libp2p:secio 3. finish - finish +6ms
  libp2p:tcp:listen new connection /ip4/server3.ip.address/tcp/45720 +875ms
  libp2p:secio 1. propose - start +114ms
  libp2p:secio 1. propose - writing proposal +1ms
  libp2p:secio 1. propose - reading proposal <Buffer 0a 10 ... 0f 00 ... > +113ms
  libp2p:secio 1.1 identify +1ms
  libp2p:secio 1.1 identify - Qm....rL - identified remote peer as Qm....KG +1ms
  libp2p:secio 1.2 selection +1ms
  libp2p:secio 1. propose - finish +1ms
  libp2p:secio 2. exchange - start +5ms
  libp2p:secio 2. exchange - writing exchange +0ms
  libp2p:secio 2. exchange - reading exchange +129ms
  libp2p:secio 2.1. verify +0ms
  libp2p:secio 2.1. verify - signature verified +4ms
  libp2p:secio 2.2. keys +2ms
  libp2p:secio 2.3. mac + cipher +4ms
  libp2p:secio 2. exchange - finish +2ms
  libp2p:secio 3. finish - start +0ms
  libp2p:secio 3. finish - finish +111ms
  libp2p:swarm:dial dialing Qm...of server1 +5s
  libp2p:swarm:dial dialing Qm...of server3 +1ms

After that the node keeps on outputting:

libp2p:swarm:dial dialing Qm...of server1+5s
libp2p:swarm:dial dialing Qm...of server3+1ms

So the libp2p is dailing to the different servers the whole time, but after a while if we run:

ipfs.object.get(hash, { enc: "base58" }, (err, node) => {
if (err) {
      reject(["error", err]);
 }
resolve(["success", node.toJSON().data]);
});

IPFS never gets the object from its peers.

@Jacodelange
Copy link

@diasdavid If I start jsipfs daemon in the command line and leave it open for 30 minutes+, the jsipfs swarm peers show that all of the nodes are connected but when I request a file the function gets stuck.

@daviddias daviddias added the status/ready Ready to be worked label Jul 27, 2017
@daviddias
Copy link
Member

Hi! Still seeing this issue?

@daviddias
Copy link
Member

Closing for now. Let me know if the situation still occurs

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up status/ready Ready to be worked
Projects
None yet
Development

No branches or pull requests

3 participants