feat: SDK for redundant usage of filter/lightpush #1463

fryorcraken · 2023-08-08T04:51:09Z

Planned start date:
Due date:

Summary

Implement a scoring or other mechanism to enable js-waku nodes to:

Rely on random internet peers with minimal degradation of the experience
Subsequently, save peers in local storage and use them upon start-up

Implementing (2) without (1) would mean that upon start up, a node would not connect to bootstrap (Waku fleet) peers but previously found peers. Such peers may not be reliable and could lead to a full degradation of the experience.
A js-waku needs to determine whether it can avoid using bootstrap peers.

Also note:

Usage of bootstrap peer should still be done for store service until we have distributed service
peers passed as static list should be considered as bootstrap peers

Acceptance Criteria

A js-waku node can use services (filter, light push) from several remote nodes at the same time: feat: lightpush & filter send requests to multiple peers #1779
Some (scoring) mechanism to enable a local peer to determine whether a remote peer is reliable enough to be used for filter and light push services
Save peers in local storage and use them upon next restart for filter and light push, if they are deemed reliable, to increase decentralization and reduce load of bootstrap fleet.

Notes

To ensure the API consumers does not receive duplicate messages when several nodes are used for filter, caching of message (MUID) will be necessary.

Tasks

Filter subscribe on 2 different nwaku nodes doesn't work reliable #1606

RAID (Risks, Assumptions, Issues and Dependencies)

Depends on @waku-org/research to help/deliver the scoring/other logic.

fryorcraken · 2023-08-08T04:52:57Z

Some idea for a logic: #914 (comment)

weboko · 2023-08-10T10:50:47Z

@danisharora099 to check a way to understand how reliable a peer (scoring) is by using existing nwaku API (possibly libp2p's protocol)

fryorcraken · 2023-08-15T07:39:00Z

@danisharora099 Shall we add a latency check as part of this milestone where we select the peers with lowest latency.
May be we even have a logic that pings every new peer via PX and if a faster peer is found we start to use it (in addition to other peers).

Maybe latency can be part of some scoring mechanism? not sure

jm-clius · 2023-08-15T17:33:03Z

Great initiative to look at some of these questions, especially as it relates to filter usage!
Filter relies in many ways on the same building blocks as relay for its reliability, but in a modular, "pick your own tradeoffs" way:

redundancy (for relay in full message connections, for filter in subscriptions)
randomness (selecting random peers for connection/subscription, preferably with some peer cycling)
periodically checking that you received all messages against a cache (this doesn't really exist yet for filter, but you could imagine using occasional store queries to achieve something similar)

As such it will be helpful to provide a configurable "reliability" SDK on top of filter for projects without the scope to build these features from the ground up with filter.

A js-waku node can use services (filter, light push) from several remote nodes at the same time.

Indeed. For now I'd suggest just selecting random nodes in the network as filter/lightpush peers, with some redundancy factor built in.

Some (scoring) mechanism to enable a local peer to determine whether a remote peer is reliable enough to be used for filter and light push services

I wouldn't necessarily bring scoring into this. Relay/gossipsub, for example, simply choose to eventually disconnect from peers that provides less value than others (peer scoring may be too long-lived and complex if there's simply a temporary connectivity issue). You could for example have n filter subscriptions and periodically review if some peers have "missed" more messages than others and cycle those.

Save peers in local storage and use them upon next restart for filter and light push, if they are deemed reliable, to increase decentralization and reduce load of bootstrap fleet.

I wouldn't imagine that the DNS lookups, followed by initial peer-exchange should take very long. It's probably a good idea to cache some peers, but I would try to flush out that cache as soon as possible after a startup and replace each of these subscriptions with a new one to a random node. This is to prevent a node from always using the same peers and so being vulnerable to bias.

Note that @siphiuel has been doing similar work on filter for status-go, so definitely worth getting his input here. :)

danisharora099 · 2023-10-17T09:20:09Z

@jm-clius agree with your overall idea, thanks for the comment!

re:

randomness (selecting random peers for connection/subscription, preferably with some peer cycling)

we decided to use the peer with the lowest ping for this, with the aim of having fastest responses to protocol requests so not sure how useful randomness is in the context of js-waku
perhaps, the strategy can be to increase the score of the node with the lowest peer for js-waku cc @fryorcraken

fryorcraken · 2023-10-20T04:51:14Z

I'd suggest to follow @jm-clius 's recommendation here and not introduce scoring.
I think prioritizing nodes with lowest latency first makes sense.
Then, if nodes are unreliable, we can disconnect and use another node.

danisharora099 · 2023-10-20T12:29:10Z

attributes that could contribute to defining "reliability":

remote peer should have relay enabled
latency
number of times a remote peer has dropped a connection with us
peers discovered through peer-exchange
- this also includes deprioritizing local storage peers in favour of peer-exchange peers

rough implementation (needs improvement):
whenever a protocol request is initiated:

get all the peers connected
check that they support relay (prioritize these peers, for the remaining "seats" use other peers)
sort them by their latencies & reliability gauged by their # of disconnections
use the top N peers to send the protocol request
observe these N peers,
- if any of them prove to be "unreliable", ie, unable to process (?) our request, or sends a faulty response
- deprioritize them, and cycle with a new peer

cc @waku-org/research @fryorcraken

fryorcraken · 2023-10-24T05:38:36Z

attributes that could contribute to defining "reliability":

* remote peer should have relay enabled

* latency

* number of times a remote peer has dropped a connection with us

* peers discovered through peer-exchange
  
  * this also includes deprioritizing local storage peers in favour of peer-exchange peers

IMO the most important criteria is missing from the list:

Push the same or more messages than other peers on filter subscription
does not return error when doing a filter request such as ping
does not return error on light push requests

danisharora099 · 2024-01-10T09:59:51Z

action plan:

if cache does not exist on startup:

DNS lookup, Peer Exchange & connect to fastest peers
cache peers in local storage
periodically "review" peers for reliability & if a node is "unreliable", swap it out for a random node
- update cache if necessary

if cache exists on startup:

connect to the cached peers
once connections are established, flush out the cache & use to the new "fastest peers"
periodically "review" peers for reliability & if a node is "unreliable", swap it out for a random node
- update cache if necessary

PRs:

use multiple peers for lightpush & filter instead of just one (currently): feat: lightpush & filter send requests to multiple peers #1779
introducing caching/local storage as a discovery module & storing peers: feat: local discovery #1811
connecting & using cached peers, cycling with fastest peers once established, updating cache

The scope of unreliability can be tackled as a followup PR

cc @jm-clius @waku-org/js-waku-developers please let me know if you have thoughts

fryorcraken · 2024-02-06T03:30:39Z

3. if cache exists on startup:

* connect to the cached peers

* once connections are established, flush out the cache & use to the new "fastest peers"

What peers? do you mean you do DNS discovery and peer exchange?

* periodically "review" peers for reliability & if a node is "unreliable", swap it out for a random node
  
  * update cache if necessary

danisharora099 · 2024-02-07T06:22:48Z

What peers? do you mean you do DNS discovery and peer exchange?

With "cache existing on startup" means the nodes that we were previously able to connect to healthily, and are stored in our local storage. We connect to them, run PX on them, find new peers and eventually remove them and add these new found peers so we don't keep reusing the same peers to connect to.

danisharora099 · 2024-03-06T10:39:24Z

remainder:

cycling of peers in local storage after startup, when new peers are discovered
disconnection from unreliable peers and connection to new ones for light protocols (to be tackled after chore: protocol implementations in @waku/core should be as unopinionated as possible #1886): feat: peer management for protocols (with disconnection management) #2002

weboko · 2024-05-14T23:04:19Z

As the last working item in this issue is linked to Reliability milestone - I am closing this with decoupling peer scoring into #2017

danisharora099 · 2024-05-15T07:14:03Z

Another working item from this issue: #2018

fryorcraken added track:restricted-run Restricted run track (Secure Messaging/Waku Product), e.g. filter, WebRTC milestone Tracks a subteam milestone E:2023-peer-mgmt labels Aug 8, 2023

This was referenced Aug 8, 2023

[Connection Manager] Improve fallback mechanism when remote peer rejects connection #1326

Closed

Peer Management: Connection and Disconnection #914

Closed

fryorcraken changed the title ~~[Milestone] Peer Management: Scoring and Persistence~~ [Milestone] Peer Management: Scoring, Redundancy and Persistence Aug 8, 2023

fryorcraken mentioned this issue Aug 8, 2023

Peer Management: Automated actions upon reconnection #1464

Open

4 tasks

danisharora099 mentioned this issue Aug 9, 2023

feat: allow good peers to be saved and dial on reloads waku-org/examples.waku.org#262

Closed

fryorcraken mentioned this issue Aug 14, 2023

[Epic] Peer Exchange is supported and used by default #1429

Closed

10 tasks

fryorcraken changed the title ~~[Milestone] Peer Management: Scoring, Redundancy and Persistence~~ [Epic] Peer Management: Scoring, Redundancy and Persistence Aug 24, 2023

fryorcraken added epic Tracks a yearly team epic (only for waku-org/pm repo) and removed milestone Tracks a subteam milestone labels Aug 24, 2023

fryorcraken mentioned this issue Aug 29, 2023

[Milestone] Peer management strategy for relay and light nodes are defined and implemented waku-org/pm#33

Closed

5 tasks

chaitanyaprem mentioned this issue Aug 29, 2023

feat: SDK: Reliable Message Subscription API for lightClient protocols waku-org/go-waku#693

Closed

danisharora099 mentioned this issue Aug 29, 2023

feat!: set peer-exchange with default bootstrap #1469

Merged

1 task

fryorcraken mentioned this issue Aug 31, 2023

feat: SDK for using filter/lightpush #1507

Closed

fryorcraken changed the title ~~[Epic] Peer Management: Scoring, Redundancy and Persistence~~ feat: SDK for using filter/lightpush Sep 8, 2023

fryorcraken added E:2.1: Production testing of existing protocols See https://github.com/waku-org/pm/issues/49 for details and removed E:2023-peer-mgmt labels Sep 8, 2023

fryorcraken mentioned this issue Sep 12, 2023

[Epic] 2.1: Production testing of existing protocols waku-org/pm#49

Closed

5 tasks

fryorcraken changed the title ~~feat: SDK for using filter/lightpush~~ feat: SDK for redundant usage of filter/lightpush Sep 21, 2023

fryorcraken mentioned this issue Oct 9, 2023

Filter subscribe on 2 different nwaku nodes doesn't work reliable #1606

Closed

danisharora099 self-assigned this Oct 10, 2023

danisharora099 mentioned this issue Oct 10, 2023

create abstraction over ENR-related access between PeerStore #1648

Open

danisharora099 removed their assignment Oct 11, 2023

danisharora099 mentioned this issue Oct 12, 2023

multiple connections opened for the same peer #1459

Closed

danisharora099 self-assigned this Oct 13, 2023

fryorcraken removed track:restricted-run Restricted run track (Secure Messaging/Waku Product), e.g. filter, WebRTC epic Tracks a yearly team epic (only for waku-org/pm repo) labels Oct 25, 2023

fryorcraken mentioned this issue Oct 27, 2023

feat: Plugin API #1686

Open

3 tasks

danisharora099 mentioned this issue Nov 1, 2023

bring back the ability to specify peer in lightpush.send #1695

Open

danisharora099 mentioned this issue Dec 14, 2023

feat: use metadata protocol for awaiting connection to remote peer #1759

Merged

danisharora099 mentioned this issue Jan 10, 2024

feat: lightpush & filter send requests to multiple peers #1779

Merged

1 task

chair28980 assigned adklempner Jan 10, 2024

danisharora099 unassigned adklempner Jan 16, 2024

danisharora099 mentioned this issue Jan 16, 2024

chore(tests): restructure & cleanup #1796

Merged

danisharora099 mentioned this issue Jan 24, 2024

feat: local discovery #1811

Merged

chair28980 mentioned this issue Mar 7, 2024

[Epic: js-waku] Reliability Protocol for Resource-Restricted Clients #2154

Closed

39 tasks

chair28980 added E:js-waku Improve Reliability and removed E:2.1: Production testing of existing protocols See https://github.com/waku-org/pm/issues/49 for details labels Mar 12, 2024

weboko mentioned this issue May 14, 2024

feat: peer scoring strategies for light protocols #2017

Open

weboko closed this as completed May 14, 2024

danisharora099 mentioned this issue May 15, 2024

feat: cycle peers in local storage #2018

Open

chair28980 added E:js-waku Reliability Protocol for Resource-Restri and removed E:js-waku Improve Reliability labels Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SDK for redundant usage of filter/lightpush #1463

feat: SDK for redundant usage of filter/lightpush #1463

fryorcraken commented Aug 8, 2023 •

edited by danisharora099

Loading

fryorcraken commented Aug 8, 2023

weboko commented Aug 10, 2023

fryorcraken commented Aug 15, 2023 •

edited

Loading

jm-clius commented Aug 15, 2023

danisharora099 commented Oct 17, 2023 •

edited

Loading

fryorcraken commented Oct 20, 2023

danisharora099 commented Oct 20, 2023 •

edited

Loading

fryorcraken commented Oct 24, 2023

danisharora099 commented Jan 10, 2024 •

edited

Loading

fryorcraken commented Feb 6, 2024

danisharora099 commented Feb 7, 2024

danisharora099 commented Mar 6, 2024 •

edited

Loading

weboko commented May 14, 2024

danisharora099 commented May 15, 2024

feat: SDK for redundant usage of filter/lightpush #1463

feat: SDK for redundant usage of filter/lightpush #1463

Comments

fryorcraken commented Aug 8, 2023 • edited by danisharora099 Loading

Summary

Acceptance Criteria

Notes

Tasks

RAID (Risks, Assumptions, Issues and Dependencies)

fryorcraken commented Aug 8, 2023

weboko commented Aug 10, 2023

fryorcraken commented Aug 15, 2023 • edited Loading

jm-clius commented Aug 15, 2023

danisharora099 commented Oct 17, 2023 • edited Loading

fryorcraken commented Oct 20, 2023

danisharora099 commented Oct 20, 2023 • edited Loading

fryorcraken commented Oct 24, 2023

danisharora099 commented Jan 10, 2024 • edited Loading

fryorcraken commented Feb 6, 2024

danisharora099 commented Feb 7, 2024

danisharora099 commented Mar 6, 2024 • edited Loading

weboko commented May 14, 2024

danisharora099 commented May 15, 2024

fryorcraken commented Aug 8, 2023 •

edited by danisharora099

Loading

fryorcraken commented Aug 15, 2023 •

edited

Loading

danisharora099 commented Oct 17, 2023 •

edited

Loading

danisharora099 commented Oct 20, 2023 •

edited

Loading

danisharora099 commented Jan 10, 2024 •

edited

Loading

danisharora099 commented Mar 6, 2024 •

edited

Loading