Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handshake tracking to kademlia protocol #52

Open
wants to merge 23 commits into
base: develop
Choose a base branch
from

Conversation

konradkonrad
Copy link

This addresses part 1) of #50.

Changes:

  • track state of ping pong handshake ("bonding") in Node (ping_recv + pong_recv)
  • add method for initiating bonding
  • added get_node for explicitly accessing nodes from kademlia routing table (so handshake tracking deals with the same instances)
  • delay find_node requests until after bonding
  • require successful bonding before answering find_node requests
  • refactored update() method for better understanding of the flow
  • adjusted tests to the new model

The `update()` method of `KademliaProtocol` is far too complex for sensible
refactoring, this is a first step to untangle all involved state transitions.
Due to bonding (ping/pong), it must be possible to delay sending of find_node
requests until bonding succeeded. Delayed requests will be send in `update()`
method after bonding.
KademliaProtocol delays `find_node` requests until bonding succeeded. This is
now reflected in the test.
At the end of bonding a node is not necessarily in the routing table. But if it
is, we ensure the bonding property (i.e. node identity between `node` in scope
and `node` from routing).
This fixes the `test_many` test to cater for bonding during the bootstrapping
loop.
Note: this passes all of the previous assertions, although I'm unsure whether
this really tests the desired behaviour.
At this point I am unsure why this test is set up this way: it triggers a pytest
timeout and is then marked xfail, while the test method is worded
`test_..._timeout`. This should probably be revisited, but I consider it out of
scope for this.
@coveralls
Copy link

coveralls commented Jan 18, 2017

Coverage Status

Coverage increased (+0.08%) to 87.664% when pulling ad430a9 on konradkonrad:fix_kademlia into bc0691c on ethereum:develop.


@property
def bonded(self):
return self.ping_recv and self.pong_recv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the protocol specify how long the bonding info is valid?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

informal spec from geth:
bonding state is valid ~indefinitely in protocol version 4, however: when a node is marked for replacement (new candidate for full bucket should be added) and the known, "bonded" node does not respond to the keep alive ping, it will be

  • replaced by the new candidate BUT
  • kept in the "known" state (i.e. bonded, will receive answers to queries) for up to 24h (outside the table)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something to add here: as far as I know, geth solves the out-of-table state tracking by persisting node state in a db. we don't have node persistence available. question is, if it was worth adding?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a simple pickle works for the data structure, and we already have infrastructure in devp2p to read config from a specified path, i'd say yes.

@@ -201,6 +208,15 @@ def to_binary(x): # left padded bit representation
return i - 1
raise Exception

def get_node(self, node):
if node in self:
Copy link
Member

@heikoheiko heikoheiko Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unnecessarily looping the self.node twice.

alternative:
return ([n for n in self.nodes if n.id == node.id] or [None])[0]

@coveralls
Copy link

coveralls commented Jan 23, 2017

Coverage Status

Coverage decreased (-0.2%) to 87.426% when pulling e6a46eb on konradkonrad:fix_kademlia into bc0691c on ethereum:develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants